Project

General

Profile

Feature #20

client: recover from a killed session (w/ blacklist)

Added by Sage Weil over 10 years ago. Updated 27 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
octopus
Reviewed:
Affected Versions:
Component(FS):
Client, kceph
Labels (FS):
task(medium)
Pull request ID:

Description

The client_reconnect_stale config option no longer works because the blacklist/eviction logic changed significantly since it was introduced.

One option for a more robust solution is to create a new configuration option that allows the client to acquire a new cluster id (client.1234...), reconnect to the MDSs, and re-acquire all caps. In-flight ops should be retried. Cached reads and buffered writes should be dropped. Open file handles should return EIO.


Related issues

Related to fs - Documentation #45573: doc: client: client_reconnect_stale=1 New
Copied to Linux kernel client - Feature #39967: kclient: recover from a killed session (w/ blacklist) Resolved
Copied to fs - Backport #46402: octopus: client: recover from a killed session (w/ blacklist) Resolved

History

#1 Updated by Sage Weil over 10 years ago

  • Category set to 1

#2 Updated by Sage Weil over 10 years ago

  • Target version changed from v0.21 to v0.22

#3 Updated by Sage Weil about 10 years ago

  • Target version changed from v0.22 to 12

#4 Updated by Sage Weil almost 10 years ago

  • Estimated time set to 10.00 h
  • Source set to 5

#5 Updated by Sage Weil about 9 years ago

  • Target version deleted (12)

#6 Updated by Sage Weil about 9 years ago

  • translation missing: en.field_position deleted (485)
  • translation missing: en.field_position set to 841

#7 Updated by Sage Weil almost 8 years ago

  • Project changed from Ceph to fs
  • Category deleted (1)

#8 Updated by Greg Farnum about 4 years ago

  • Category set to Administration/Usability

#9 Updated by Patrick Donnelly over 1 year ago

  • Subject changed from mds: allow client reconnect while up:active to client: recover from a killed session (w/ blacklist)
  • Description updated (diff)
  • Target version set to v14.0.0
  • Start date deleted (04/09/2010)
  • Source set to Development
  • Component(FS) Client, kceph added
  • Labels (FS) task(medium) added

I'm going to suggest attacking this problem from the other direction.

#10 Updated by Patrick Donnelly over 1 year ago

  • Target version changed from v14.0.0 to v15.0.0

#11 Updated by Patrick Donnelly over 1 year ago

  • Target version deleted (v15.0.0)

#12 Updated by Patrick Donnelly over 1 year ago

  • Target version set to v15.0.0
  • Estimated time deleted (10.00 h)

#13 Updated by Patrick Donnelly over 1 year ago

  • Status changed from New to Fix Under Review
  • Assignee set to Zheng Yan
  • Priority changed from Normal to High
  • Backport set to nautilus
  • Pull request ID set to 27435

#14 Updated by Patrick Donnelly over 1 year ago

  • Copied to Feature #39967: kclient: recover from a killed session (w/ blacklist) added

#15 Updated by Zheng Yan 11 months ago

  • Pull request ID changed from 27435 to 31480

#16 Updated by Patrick Donnelly 7 months ago

  • Backport deleted (nautilus)

#17 Updated by Greg Farnum 6 months ago

  • Status changed from Fix Under Review to Resolved

#18 Updated by Patrick Donnelly 5 months ago

  • Target version changed from v15.0.0 to v16.0.0

this merged after octopus

#19 Updated by Patrick Donnelly 4 months ago

#20 Updated by Nathan Cutler 3 months ago

  • Status changed from Resolved to Pending Backport
  • Backport set to octopus

#21 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #46402: octopus: client: recover from a killed session (w/ blacklist) added

#22 Updated by Patrick Donnelly 2 months ago

Nathan, why was this changed to backport to Octopus?

#23 Updated by Patrick Donnelly 2 months ago

Patrick Donnelly wrote:

Nathan, why was this changed to backport to Octopus?

I see: https://github.com/ceph/ceph/pull/35962#issuecomment-654945601

#24 Updated by Nathan Cutler 2 months ago

Right. To summarize: the question whether it should be backported was asked, but got no answer, and in the meantime we are getting other backports that do not apply cleanly to octopus because this feature has not been backported.

But, that doesn't mean I insist on backporting it. It's just a proposal.

#25 Updated by Nathan Cutler 27 days ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF