client: recover from a killed session (w/ blacklist)
The client_reconnect_stale config option no longer works because the blacklist/eviction logic changed significantly since it was introduced.
One option for a more robust solution is to create a new configuration option that allows the client to acquire a new cluster id (client.1234...), reconnect to the MDSs, and re-acquire all caps. In-flight ops should be retried. Cached reads and buffered writes should be dropped. Open file handles should return EIO.
#9 Updated by Patrick Donnelly over 1 year ago
- Subject changed from mds: allow client reconnect while up:active to client: recover from a killed session (w/ blacklist)
- Description updated (diff)
- Target version set to v14.0.0
- Start date deleted (
- Source set to Development
- Component(FS) Client, kceph added
- Labels (FS) task(medium) added
I'm going to suggest attacking this problem from the other direction.
#23 Updated by Patrick Donnelly 2 months ago
Patrick Donnelly wrote:
Nathan, why was this changed to backport to Octopus?
#24 Updated by Nathan Cutler 2 months ago
Right. To summarize: the question whether it should be backported was asked, but got no answer, and in the meantime we are getting other backports that do not apply cleanly to octopus because this feature has not been backported.
But, that doesn't mean I insist on backporting it. It's just a proposal.