Bug #9216: mds may regard active clients as stale due to slow pg recovery - CephFS - Ceph

Actions

Copy link

Bug #9216

open

mds may regard active clients as stale due to slow pg recovery

Added by Alexandre Oliva over 9 years ago. Updated almost 8 years ago.

Status:

New

Priority:

Low

Assignee:

Category:

Correctness/Safety

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

5 - suggestion

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Common/Protocol, MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I occasionally get fuse and ceph.ko mounts into weird states, and I can generally track them down to the mds's deciding that those clients were stale even though they were not. Most often, the mds crashes shortly after that, and sometimes the stale-but-not-really clients succeed in reconnecting before the reconnect window closes, and this makes them all right most of the time. However, I've recently observed a situation in which the mds survived, and then the ceph.ko clients would attempt to reconnect and be denied because the mds was already active.

Anyway, the primary cause of all this pain appears to be the slow recovery metadata PGs after an osd times out or some such, and more importantly the fact that the mds does not appear to take into acount pending messages and its own stuck-waiting-for-PGs status before regarding a client session as stale. I think the mds should extend the stale-session time-out counter when it is itself laggy or failing to journal any progress.

Actions

Copy link

Updated by Greg Farnum over 9 years ago

Interesting. Did you establish the mechanism by which the clients are being stale? Do they have a renew caps request pending (that they don't re-send) which the MDS can't journal, until time runs out? Something else?

Actions

Copy link

Updated by Alexandre Oliva over 9 years ago

I haven't got that far yet, but if I had to guess I'd say it is not about caps, since when this happens, all existing sessions are expired kind of simultaneously, in spite of some being active and some being long idle. I'd guess (without any evidence or knowledge of the protocol whatsoever) heartbeats are getting piled up in a message queue that the mds isn't handling because it's stuck, or heartbeat acks that are not sent for some similar reason and prevent further heartbeats from being sent, or just that the received heartbeats have to hit the mds journal before they take the effect of postponing the deadline, which makes them seem too late.

Actions

Copy link