Project

General

Profile

Bug #4742

mds: stuck clientreplay request

Added by Greg Farnum almost 8 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14246

It has a single request which isn't completing; who knows why or if we can reproduce by restarting the MDS.

ceph-mds.b-s-a.log.gz (6.61 MB) Sam Lang, 04/23/2013 10:29 AM

setattr-ceph-client.0.13000.log.bz2 (2.61 MB) Sam Lang, 04/25/2013 10:38 AM

setattr-ceph-mds.a.log.bz2 (7.15 MB) Sam Lang, 04/25/2013 10:38 AM

setattr-ceph-mds.b-s-a.log.bz2 (3.32 MB) Sam Lang, 04/25/2013 10:38 AM

setattr-mds_requests (1.52 KB) Sam Lang, 04/25/2013 10:38 AM

rename-ceph-client.0.12715.log.bz2 (2.61 MB) Sam Lang, 04/25/2013 10:38 AM

rename-ceph-mds.a.log.bz2 (3.34 MB) Sam Lang, 04/25/2013 10:38 AM

rename-ceph-mds.b-s-a.log.bz2 (6.46 MB) Sam Lang, 04/25/2013 10:38 AM

rename-mds_requests (1.16 KB) Sam Lang, 04/25/2013 10:38 AM

Associated revisions

Revision 5121e56c (diff)
Added by Sam Lang almost 8 years ago

client: don't embed cap releases in clientreplay

If the client is sending replay requests, avoid sending embedded caps,
since the mds already has the client's caps from the reconnect.
This matches the behavior of the kernel client.

Fixes #4742.
Signed-off-by: Sam Lang <>
Reviewed-by: Sage Weil <>
Reviewed-by: Greg Farnum <>

History

#1 Updated by Sam Lang almost 8 years ago

Looks like a setattr and a create:

ubuntu@plana72:~$ sudo ceph --admin-daemon /var/run/ceph/ceph-client.0.19374.asok mds_requests { "tid": 1219,
"op": "setattr",
"path": "#100000001f5",
"path2": "",
"ino": "100000001f5",
"target_ino": "100000001f5",
"hint_ino": "0",
"sent_stamp": "2013-04-17 03:00:24.344943",
"mds": 0,
"resend_mds": -1,
"send_to_auth": 0,
"sent_on_mseq": 0,
"retry_attempt": 0,
"got_unsafe": 1,
"uid": 0,
"gid": 0,
"oldest_client_tid": 1211,
"mdsmap_epoch": 0,
"flags": 0,
"num_retry": 0,
"num_fwd": 0,
"num_releases": 0}{ "tid": 1220,
"op": "create",
"path": "#100000001f5\/fstest_b5c1034e024d6f8e44a438e430391c84",
"path2": "",
"ino": "100000001f5",
"dentry": "fstest_b5c1034e024d6f8e44a438e430391c84",
"hint_ino": "0",
"sent_stamp": "2013-04-17 03:00:25.360236",
"mds": 0,
"resend_mds": -1,
"send_to_auth": 0,
"sent_on_mseq": 0,
"retry_attempt": 0,
"got_unsafe": 0,
"uid": 0,
"gid": 0,
"oldest_client_tid": 1219,
"mdsmap_epoch": 0,
"flags": 0,
"num_retry": 0,
"num_fwd": 0,
"num_releases": 0}

#2 Updated by Sam Lang almost 8 years ago

Marked #4741 as a duplicate of this bug. It looks like setattr is the culprit. I was able to generate a core file of the mds while it was in this state, and the only request sitting in mds->mdcache->active_requests is the setattr which the client is waiting for (and already has an unsafe reply to). I have the dump of the mds cache as well, all that it shows for the inode the setattr is operating on is that its dirty.

#3 Updated by Sam Lang almost 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Sam Lang

#4 Updated by Sam Lang almost 8 years ago

Attaching mds log from mds stuck on clientreplay. Looks like setattr is gets put on the inode waiting list by the locker, after sending the client a caps revoke for pin auth.

#5 Updated by Sam Lang almost 8 years ago

Logs for two runs, one is stuck in replay from a setattr, the other is stuck in replay from a rename.

#6 Updated by Zheng Yan almost 8 years ago

Looks like a client bug, it may add cap releases to the replay requests. (encode_cap_releases() should be called when creating request, instead of sending request)

#7 Updated by Greg Farnum almost 8 years ago

Yeah, we've discussed this some on github around wip-4742 and on irc. :)

#8 Updated by Sage Weil almost 8 years ago

  • Status changed from In Progress to Resolved

commit:5121e56c255c079569f02e0ee852e469f38f470e

#9 Updated by Greg Farnum over 4 years ago

  • Component(FS) MDS added

Also available in: Atom PDF