Bug #13903
closedFailure in TestStrays.test_ops_throttle
0%
Description
Note that this is different from http://tracker.ceph.com/issues/12657
http://pulpito.ceph.com/teuthology-2015-11-23_23:04:04-fs-master---basic-multi/1157802/
This is new on master, and intermittent.
Updated by John Spray over 8 years ago
Greg: that linked result is from TestDamage, was there something in the logs that indicated it had a common cause with this issue?
Updated by Greg Farnum over 8 years ago
- Priority changed from Normal to Urgent
Updated by John Spray over 8 years ago
- Status changed from New to In Progress
- Assignee set to John Spray
Updated by John Spray over 8 years ago
So in all three cases we're seeing just a single inode that's failing to get purged, probably the dir.
http://pulpito.ceph.com/teuthology-2015-11-23_23:04:04-fs-master---basic-multi/1157802/
2015-11-25T19:41:45.235 INFO:tasks.cephfs.test_strays:Waiting for purge to complete 0/1, 1600/1601
2015-12-22T15:30:41.215 INFO:tasks.cephfs.test_strays:Waiting for purge to complete 0/1, 3200/3201
http://pulpito.ovh.sepia.ceph.com:8081/gregf-2015-12-23_05:34:31-fs-master---basic-openstack/50203/
2015-12-23T07:38:05.973 INFO:tasks.cephfs.test_strays:Waiting for purge to complete 0/1, 3200/3201
Updated by John Spray over 8 years ago
http://pulpito.ceph.com/teuthology-2015-11-23_23:04:04-fs-master---basic-multi/1157802/
In this case I can see the stray #100/stray2/10000000000 finally getting purged just after the client session ends, so this is a client->server protocol behaviour that's keeping it stuck.
Updated by John Spray over 8 years ago
The client is receiving a client_caps message for the dir just after it's done the unlink. I think that's preventing it from sending the client_cap_release that it would usually send.
2015-11-25 19:27:58.059696 7f50137fe700 20 client.4467 encode_inode_release enter(in:10000000000.head(faked_ino=0 ref=7 ll_ref=1605 cap_refs={} open={} mode=40755 size=0/0 mtime=2015-11-25 19:27:58.051061 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x7f5004005e90 0x7f501c01a5a0), req:0x7f5004077fd0 mds:0, drop:256, unless:512, have:, force:1) 2015-11-25 19:27:58.071510 7f502bfff700 5 client.4467 handle_cap_grant on in 10000000000 mds.0 seq 7846 caps now pAsLsXsFs was pAsLsXsFsx 2015-11-25 19:27:58.085591 7f502bfff700 5 client.4467 handle_cap_grant on in 10000000000 mds.0 seq 7847 caps now pAsXsFs was pAsLsXsFs 2015-11-25 19:27:58.222784 7f5031ffb700 1 -- 10.214.134.136:0/2601946693 --> 10.214.132.10:6806/20760 -- client_cap_release(73) v2 -- ?+0 0x7f501c10f230 con 0x7f501c014a60 2015-11-25 19:27:58.451742 7f502bfff700 5 client.4467 handle_cap_grant on in 10000000000 mds.0 seq 7848 caps now pAsXs was pAsXsFs 2015-11-25 19:27:58.457949 7f502bfff700 5 client.4467 handle_cap_grant on in 10000000000 mds.0 seq 7849 caps now pAsLsXs was pAsXs 2015-11-25 19:41:46.808582 7f502bfff700 1 -- 10.214.134.136:0/2601946693 --> 10.214.132.10:6806/20760 -- client_cap_release(2) v2 -- ?+0 0x7f503ee09570 con 0x7f501c014a60
(that big time gap between the last two is between the place we wanted it to happen, and the place where it's eventually happening after umount)
Updated by John Spray over 8 years ago
This is reproducible with a simpler "delete lots of files and then their directory" test https://github.com/ceph/ceph-qa-suite/pull/787
Updated by Greg Farnum over 8 years ago
I think you talked about this in standup but I'm forgetting — do you need somebody else to look over the caps stuff here?
Updated by John Spray over 8 years ago
If anyone has time, yes -- given enough time I can figure it out but it might be more obvious to someone more familiar. It's not obvious to me whether the sequence of cap ops is fine and we just need another special case for unlinking where we bounce grants that occur after the unlink, or if the way we're getting granted caps after we no longer even want/need them is wrong.
Updated by Greg Farnum over 8 years ago
- Assignee changed from John Spray to Zheng Yan
Zheng, please take a look.
Updated by Zheng Yan over 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Greg Farnum about 8 years ago
- Status changed from Fix Under Review to Resolved
Whoops, merged this last week.