Project

General

Profile

Feature #20

client: recover from a killed session (w/ blacklist)

Added by Sage Weil about 11 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
octopus
Reviewed:
Affected Versions:
Component(FS):
Client, kceph
Labels (FS):
task(medium)
Pull request ID:

Description

The client_reconnect_stale config option no longer works because the blacklist/eviction logic changed significantly since it was introduced.

One option for a more robust solution is to create a new configuration option that allows the client to acquire a new cluster id (client.1234...), reconnect to the MDSs, and re-acquire all caps. In-flight ops should be retried. Cached reads and buffered writes should be dropped. Open file handles should return EIO.


Related issues

Related to CephFS - Documentation #45573: doc: client: client_reconnect_stale=1 New
Duplicated by CephFS - Bug #42271: client: ceph-fuse which had been blacklisted couldn't auto reconnect after cluster unblacklisted it. Resolved
Copied to Linux kernel client - Feature #39967: kclient: recover from a killed session (w/ blacklist) Resolved
Copied to CephFS - Backport #46402: octopus: client: recover from a killed session (w/ blacklist) Resolved

History

#1 Updated by Sage Weil about 11 years ago

  • Category set to 1

#2 Updated by Sage Weil almost 11 years ago

  • Target version changed from v0.21 to v0.22

#3 Updated by Sage Weil over 10 years ago

  • Target version changed from v0.22 to 12

#4 Updated by Sage Weil over 10 years ago

  • Estimated time set to 10.00 h
  • Source set to 5

#5 Updated by Sage Weil over 9 years ago

  • Target version deleted (12)

#6 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_position deleted (485)
  • translation missing: en.field_position set to 841

#7 Updated by Sage Weil over 8 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

#8 Updated by Greg Farnum over 4 years ago

  • Category set to Administration/Usability

#9 Updated by Patrick Donnelly over 2 years ago

  • Subject changed from mds: allow client reconnect while up:active to client: recover from a killed session (w/ blacklist)
  • Description updated (diff)
  • Target version set to v14.0.0
  • Start date deleted (04/09/2010)
  • Source set to Development
  • Component(FS) Client, kceph added
  • Labels (FS) task(medium) added

I'm going to suggest attacking this problem from the other direction.

#10 Updated by Patrick Donnelly about 2 years ago

  • Target version changed from v14.0.0 to v15.0.0

#11 Updated by Patrick Donnelly about 2 years ago

  • Target version deleted (v15.0.0)

#12 Updated by Patrick Donnelly about 2 years ago

  • Target version set to v15.0.0
  • Estimated time deleted (10.00 h)

#13 Updated by Patrick Donnelly almost 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Zheng Yan
  • Priority changed from Normal to High
  • Backport set to nautilus
  • Pull request ID set to 27435

#14 Updated by Patrick Donnelly almost 2 years ago

  • Copied to Feature #39967: kclient: recover from a killed session (w/ blacklist) added

#15 Updated by Zheng Yan over 1 year ago

  • Pull request ID changed from 27435 to 31480

#16 Updated by Patrick Donnelly about 1 year ago

  • Backport deleted (nautilus)

#17 Updated by Greg Farnum about 1 year ago

  • Status changed from Fix Under Review to Resolved

#18 Updated by Patrick Donnelly 11 months ago

  • Target version changed from v15.0.0 to v16.0.0

this merged after octopus

#19 Updated by Patrick Donnelly 11 months ago

#20 Updated by Nathan Cutler 9 months ago

  • Status changed from Resolved to Pending Backport
  • Backport set to octopus

#21 Updated by Nathan Cutler 9 months ago

  • Copied to Backport #46402: octopus: client: recover from a killed session (w/ blacklist) added

#22 Updated by Patrick Donnelly 9 months ago

Nathan, why was this changed to backport to Octopus?

#23 Updated by Patrick Donnelly 9 months ago

Patrick Donnelly wrote:

Nathan, why was this changed to backport to Octopus?

I see: https://github.com/ceph/ceph/pull/35962#issuecomment-654945601

#24 Updated by Nathan Cutler 9 months ago

Right. To summarize: the question whether it should be backported was asked, but got no answer, and in the meantime we are getting other backports that do not apply cleanly to octopus because this feature has not been backported.

But, that doesn't mean I insist on backporting it. It's just a proposal.

#25 Updated by Nathan Cutler 8 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

#26 Updated by Patrick Donnelly 5 months ago

  • Duplicated by Bug #42271: client: ceph-fuse which had been blacklisted couldn't auto reconnect after cluster unblacklisted it. added

Also available in: Atom PDF