Actions
Bug #18641
closedmds: stalled clients apparently due to stale sessions
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
4/16 clients building the kernel with 2 active MDS blocked on IO. After digging into the ceph-fuse log, I found that the client is attempting to renew caps but gets no reply from the MDS:
2017-01-23 20:12:16.474388 7f81edbf3700 10 client.4241 renew_caps mds.0 2017-01-23 20:12:16.474396 7f81edbf3700 1 -- 192.168.171.154:0/1201971437 --> 192.168.220.2:6800/3808684069 -- client_session(request_renewcaps seq 4890) v2 -- 0x55cdc1980fc0 con 0
The MDS log repeatedly shows (from a different message, so different timestamp):
2017-01-23 20:51:56.825544 7f0f80d13700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 5009) v1 from client.4241 2017-01-23 20:51:56.825548 7f0f80d13700 10 mds.0.server ignoring renewcaps on non open|stale session (closed)
Unfortunately, I do not have the MDS logs from the time when the client last talked to mds.0. I do have the entire client log.
Last bit of the MDS log: /ceph/cephfs-perf/drop/ceph-mds-0.log.gz
Client log: /ceph/cephfs-perf/drop/ceph-client-4241.log.gz
I have other logs (e.g. mds.1) too if they're needed.
Updated by John Spray almost 7 years ago
- Status changed from New to Can't reproduce
Updated by Patrick Donnelly about 5 years ago
- Category deleted (
90) - Labels (FS) multimds added
Actions