Bug #4742
closedmds: stuck clientreplay request
0%
Description
/a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14246
It has a single request which isn't completing; who knows why or if we can reproduce by restarting the MDS.
Files
Updated by Sam Lang about 11 years ago
Looks like a setattr and a create:
ubuntu@plana72:~$ sudo ceph --admin-daemon /var/run/ceph/ceph-client.0.19374.asok mds_requests
{ "tid": 1219,
"op": "setattr",
"path": "#100000001f5",
"path2": "",
"ino": "100000001f5",
"target_ino": "100000001f5",
"hint_ino": "0",
"sent_stamp": "2013-04-17 03:00:24.344943",
"mds": 0,
"resend_mds": -1,
"send_to_auth": 0,
"sent_on_mseq": 0,
"retry_attempt": 0,
"got_unsafe": 1,
"uid": 0,
"gid": 0,
"oldest_client_tid": 1211,
"mdsmap_epoch": 0,
"flags": 0,
"num_retry": 0,
"num_fwd": 0,
"num_releases": 0}{ "tid": 1220,
"op": "create",
"path": "#100000001f5\/fstest_b5c1034e024d6f8e44a438e430391c84",
"path2": "",
"ino": "100000001f5",
"dentry": "fstest_b5c1034e024d6f8e44a438e430391c84",
"hint_ino": "0",
"sent_stamp": "2013-04-17 03:00:25.360236",
"mds": 0,
"resend_mds": -1,
"send_to_auth": 0,
"sent_on_mseq": 0,
"retry_attempt": 0,
"got_unsafe": 0,
"uid": 0,
"gid": 0,
"oldest_client_tid": 1219,
"mdsmap_epoch": 0,
"flags": 0,
"num_retry": 0,
"num_fwd": 0,
"num_releases": 0}
Updated by Sam Lang about 11 years ago
Marked #4741 as a duplicate of this bug. It looks like setattr is the culprit. I was able to generate a core file of the mds while it was in this state, and the only request sitting in mds->mdcache->active_requests is the setattr which the client is waiting for (and already has an unsafe reply to). I have the dump of the mds cache as well, all that it shows for the inode the setattr is operating on is that its dirty.
Updated by Sam Lang about 11 years ago
- Status changed from New to In Progress
- Assignee set to Sam Lang
Updated by Sam Lang almost 11 years ago
- File ceph-mds.b-s-a.log.gz ceph-mds.b-s-a.log.gz added
Attaching mds log from mds stuck on clientreplay. Looks like setattr is gets put on the inode waiting list by the locker, after sending the client a caps revoke for pin auth.
Updated by Sam Lang almost 11 years ago
- File setattr-ceph-client.0.13000.log.bz2 setattr-ceph-client.0.13000.log.bz2 added
- File setattr-ceph-mds.a.log.bz2 setattr-ceph-mds.a.log.bz2 added
- File setattr-ceph-mds.b-s-a.log.bz2 setattr-ceph-mds.b-s-a.log.bz2 added
- File setattr-mds_requests setattr-mds_requests added
- File rename-ceph-client.0.12715.log.bz2 rename-ceph-client.0.12715.log.bz2 added
- File rename-ceph-mds.a.log.bz2 rename-ceph-mds.a.log.bz2 added
- File rename-ceph-mds.b-s-a.log.bz2 rename-ceph-mds.b-s-a.log.bz2 added
- File rename-mds_requests rename-mds_requests added
Logs for two runs, one is stuck in replay from a setattr, the other is stuck in replay from a rename.
Updated by Zheng Yan almost 11 years ago
Looks like a client bug, it may add cap releases to the replay requests. (encode_cap_releases() should be called when creating request, instead of sending request)
Updated by Greg Farnum almost 11 years ago
Yeah, we've discussed this some on github around wip-4742 and on irc. :)
Updated by Sage Weil almost 11 years ago
- Status changed from In Progress to Resolved
commit:5121e56c255c079569f02e0ee852e469f38f470e