https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2015-11-30T12:27:47ZCeph CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=621772015-11-30T12:27:47ZJohn Sprayjcspray@gmail.com
<ul></ul><p>Passes when run locally :-/</p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=628112015-12-11T01:40:05ZGreg Farnumgfarnum@redhat.com
<ul></ul><p><a class="external" href="http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2015-12-08_23:04:01-fs-infernalis---basic-openstack/34531/">http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2015-12-08_23:04:01-fs-infernalis---basic-openstack/34531/</a></p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=630412015-12-15T14:12:15ZJohn Sprayjcspray@gmail.com
<ul></ul><p>Greg: that linked result is from TestDamage, was there something in the logs that indicated it had a common cause with this issue?</p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=636162016-01-06T06:36:52ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul><p>Here are some correct logs on master:</p>
<p><a class="external" href="http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2015-12-21_23:04:02-fs-master---basic-openstack/48408/">http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2015-12-21_23:04:02-fs-master---basic-openstack/48408/</a><br /><a class="external" href="http://pulpito.ovh.sepia.ceph.com:8081/gregf-2015-12-23_05:34:31-fs-master---basic-openstack/50203/">http://pulpito.ovh.sepia.ceph.com:8081/gregf-2015-12-23_05:34:31-fs-master---basic-openstack/50203/</a></p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=637722016-01-08T14:26:42ZJohn Sprayjcspray@gmail.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>John Spray</i></li></ul> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=638822016-01-11T14:50:01ZJohn Sprayjcspray@gmail.com
<ul></ul><p>So in all three cases we're seeing just a single inode that's failing to get purged, probably the dir.</p>
<p><a class="external" href="http://pulpito.ceph.com/teuthology-2015-11-23_23:04:04-fs-master---basic-multi/1157802/">http://pulpito.ceph.com/teuthology-2015-11-23_23:04:04-fs-master---basic-multi/1157802/</a></p>
<p>2015-11-25T19:41:45.235 INFO:tasks.cephfs.test_strays:Waiting for purge to complete 0/1, 1600/1601</p>
<p><a class="external" href="http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2015-12-21_23:04:02-fs-master---basic-openstack/48408/">http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2015-12-21_23:04:02-fs-master---basic-openstack/48408/</a></p>
<p>2015-12-22T15:30:41.215 INFO:tasks.cephfs.test_strays:Waiting for purge to complete 0/1, 3200/3201</p>
<p><a class="external" href="http://pulpito.ovh.sepia.ceph.com:8081/gregf-2015-12-23_05:34:31-fs-master---basic-openstack/50203/">http://pulpito.ovh.sepia.ceph.com:8081/gregf-2015-12-23_05:34:31-fs-master---basic-openstack/50203/</a></p>
<p>2015-12-23T07:38:05.973 INFO:tasks.cephfs.test_strays:Waiting for purge to complete 0/1, 3200/3201</p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=638852016-01-11T15:28:55ZJohn Sprayjcspray@gmail.com
<ul></ul><p><a class="external" href="http://pulpito.ceph.com/teuthology-2015-11-23_23:04:04-fs-master---basic-multi/1157802/">http://pulpito.ceph.com/teuthology-2015-11-23_23:04:04-fs-master---basic-multi/1157802/</a></p>
<p>In this case I can see the stray #100/stray2/10000000000 finally getting purged just after the client session ends, so this is a client->server protocol behaviour that's keeping it stuck.</p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=638872016-01-11T17:13:13ZJohn Sprayjcspray@gmail.com
<ul></ul><p>The client is receiving a client_caps message for the dir just <strong>after</strong> it's done the unlink. I think that's preventing it from sending the client_cap_release that it would usually send.<br /><pre>
2015-11-25 19:27:58.059696 7f50137fe700 20 client.4467 encode_inode_release enter(in:10000000000.head(faked_ino=0 ref=7 ll_ref=1605 cap_refs={} open={} mode=40755 size=0/0 mtime=2015-11-25 19:27:58.051061 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x7f5004005e90 0x7f501c01a5a0), req:0x7f5004077fd0 mds:0, drop:256, unless:512, have:, force:1)
2015-11-25 19:27:58.071510 7f502bfff700 5 client.4467 handle_cap_grant on in 10000000000 mds.0 seq 7846 caps now pAsLsXsFs was pAsLsXsFsx
2015-11-25 19:27:58.085591 7f502bfff700 5 client.4467 handle_cap_grant on in 10000000000 mds.0 seq 7847 caps now pAsXsFs was pAsLsXsFs
2015-11-25 19:27:58.222784 7f5031ffb700 1 -- 10.214.134.136:0/2601946693 --> 10.214.132.10:6806/20760 -- client_cap_release(73) v2 -- ?+0 0x7f501c10f230 con 0x7f501c014a60
2015-11-25 19:27:58.451742 7f502bfff700 5 client.4467 handle_cap_grant on in 10000000000 mds.0 seq 7848 caps now pAsXs was pAsXsFs
2015-11-25 19:27:58.457949 7f502bfff700 5 client.4467 handle_cap_grant on in 10000000000 mds.0 seq 7849 caps now pAsLsXs was pAsXs
2015-11-25 19:41:46.808582 7f502bfff700 1 -- 10.214.134.136:0/2601946693 --> 10.214.132.10:6806/20760 -- client_cap_release(2) v2 -- ?+0 0x7f503ee09570 con 0x7f501c014a60
</pre></p>
<p>(that big time gap between the last two is between the place we wanted it to happen, and the place where it's eventually happening after umount)</p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=638952016-01-11T20:11:54ZJohn Sprayjcspray@gmail.com
<ul></ul><p>This is reproducible with a simpler "delete lots of files and then their directory" test <a class="external" href="https://github.com/ceph/ceph-qa-suite/pull/787">https://github.com/ceph/ceph-qa-suite/pull/787</a></p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=640302016-01-13T01:11:17ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>I think you talked about this in standup but I'm forgetting — do you need somebody else to look over the caps stuff here?</p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=640672016-01-13T14:21:55ZJohn Sprayjcspray@gmail.com
<ul></ul><p>If anyone has time, yes -- given enough time I can figure it out but it might be more obvious to someone more familiar. It's not obvious to me whether the sequence of cap ops is fine and we just need another special case for unlinking where we bounce grants that occur after the unlink, or if the way we're getting granted caps after we no longer even want/need them is wrong.</p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=643512016-01-19T19:34:08ZGreg Farnumgfarnum@redhat.com
<ul></ul><p><a class="external" href="http://qa-proxy.ceph.com/teuthology/gregf-2016-01-18_19:56:11-fs-greg-fs-speculative-118---basic-mira/32912/">http://qa-proxy.ceph.com/teuthology/gregf-2016-01-18_19:56:11-fs-greg-fs-speculative-118---basic-mira/32912/</a></p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=643722016-01-20T02:55:22ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Assignee</strong> changed from <i>John Spray</i> to <i>Zheng Yan</i></li></ul><p>Zheng, please take a look.</p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=643882016-01-20T14:41:34ZZheng Yanukernel@gmail.com
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/7297">https://github.com/ceph/ceph/pull/7297</a></p> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=643892016-01-20T14:42:08ZZheng Yanukernel@gmail.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li></ul> CephFS - Bug #13903: Failure in TestStrays.test_ops_throttlehttps://tracker.ceph.com/issues/13903?journal_id=651422016-02-02T18:32:33ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul><p>Whoops, merged this last week.</p>