Project

General

Profile

Actions

Feature #20

closed

client: recover from a killed session (w/ blacklist)

Added by Sage Weil about 14 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
octopus
Reviewed:
Affected Versions:
Component(FS):
Client, kceph
Labels (FS):
task(medium)
Pull request ID:

Description

The client_reconnect_stale config option no longer works because the blacklist/eviction logic changed significantly since it was introduced.

One option for a more robust solution is to create a new configuration option that allows the client to acquire a new cluster id (client.1234...), reconnect to the MDSs, and re-acquire all caps. In-flight ops should be retried. Cached reads and buffered writes should be dropped. Open file handles should return EIO.


Related issues 4 (1 open3 closed)

Related to CephFS - Documentation #45573: doc: client: client_reconnect_stale=1New

Actions
Has duplicate CephFS - Bug #42271: client: ceph-fuse which had been blacklisted couldn't auto reconnect after cluster unblacklisted it.ResolvedZheng Yan

Actions
Copied to Linux kernel client - Feature #39967: kclient: recover from a killed session (w/ blacklist)ResolvedZheng Yan

Actions
Copied to CephFS - Backport #46402: octopus: client: recover from a killed session (w/ blacklist)ResolvedNathan CutlerActions
Actions #1

Updated by Sage Weil about 14 years ago

  • Category set to 1
Actions #2

Updated by Sage Weil almost 14 years ago

  • Target version changed from v0.21 to v0.22
Actions #3

Updated by Sage Weil over 13 years ago

  • Target version changed from v0.22 to 12
Actions #4

Updated by Sage Weil over 13 years ago

  • Estimated time set to 10:00 h
  • Source set to 5
Actions #5

Updated by Sage Weil over 12 years ago

  • Target version deleted (12)
Actions #6

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position deleted (485)
  • Translation missing: en.field_position set to 841
Actions #7

Updated by Sage Weil over 11 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
Actions #8

Updated by Greg Farnum almost 8 years ago

  • Category set to Administration/Usability
Actions #9

Updated by Patrick Donnelly over 5 years ago

  • Subject changed from mds: allow client reconnect while up:active to client: recover from a killed session (w/ blacklist)
  • Description updated (diff)
  • Target version set to v14.0.0
  • Start date deleted (04/09/2010)
  • Source set to Development
  • Component(FS) Client, kceph added
  • Labels (FS) task(medium) added

I'm going to suggest attacking this problem from the other direction.

Actions #10

Updated by Patrick Donnelly about 5 years ago

  • Target version changed from v14.0.0 to v15.0.0
Actions #11

Updated by Patrick Donnelly about 5 years ago

  • Target version deleted (v15.0.0)
Actions #12

Updated by Patrick Donnelly about 5 years ago

  • Target version set to v15.0.0
  • Estimated time deleted (10:00 h)
Actions #13

Updated by Patrick Donnelly almost 5 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Zheng Yan
  • Priority changed from Normal to High
  • Backport set to nautilus
  • Pull request ID set to 27435
Actions #14

Updated by Patrick Donnelly almost 5 years ago

  • Copied to Feature #39967: kclient: recover from a killed session (w/ blacklist) added
Actions #15

Updated by Zheng Yan over 4 years ago

  • Pull request ID changed from 27435 to 31480
Actions #16

Updated by Patrick Donnelly about 4 years ago

  • Backport deleted (nautilus)
Actions #17

Updated by Greg Farnum about 4 years ago

  • Status changed from Fix Under Review to Resolved
Actions #18

Updated by Patrick Donnelly almost 4 years ago

  • Target version changed from v15.0.0 to v16.0.0

this merged after octopus

Actions #19

Updated by Patrick Donnelly almost 4 years ago

Actions #20

Updated by Nathan Cutler almost 4 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to octopus
Actions #21

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #46402: octopus: client: recover from a killed session (w/ blacklist) added
Actions #22

Updated by Patrick Donnelly almost 4 years ago

Nathan, why was this changed to backport to Octopus?

Actions #23

Updated by Patrick Donnelly almost 4 years ago

Patrick Donnelly wrote:

Nathan, why was this changed to backport to Octopus?

I see: https://github.com/ceph/ceph/pull/35962#issuecomment-654945601

Actions #24

Updated by Nathan Cutler almost 4 years ago

Right. To summarize: the question whether it should be backported was asked, but got no answer, and in the meantime we are getting other backports that do not apply cleanly to octopus because this feature has not been backported.

But, that doesn't mean I insist on backporting it. It's just a proposal.

Actions #25

Updated by Nathan Cutler over 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #26

Updated by Patrick Donnelly over 3 years ago

  • Has duplicate Bug #42271: client: ceph-fuse which had been blacklisted couldn't auto reconnect after cluster unblacklisted it. added
Actions

Also available in: Atom PDF