https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2015-06-02T21:17:02Z
Ceph
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=53041
2015-06-02T21:17:02Z
Sage Weil
sage@newdream.net
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>rbd</i></li><li><strong>Regression</strong> set to <i>No</i></li></ul>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=53725
2015-06-16T22:30:28Z
Josh Durgin
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Can't reproduce</i></li></ul><p>If this happens again we can investigate. No trace of what went wrong from these logs.</p>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=82919
2016-12-12T14:25:33Z
Jason Dillaman
dillaman@redhat.com
<ul><li><strong>Status</strong> changed from <i>Can't reproduce</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>Jason Dillaman</i></li></ul><p><a class="external" href="http://qa-proxy.ceph.com/teuthology/teuthology-2016-12-11_02:01:21-rbd-master-distro-basic-smithi/624047/teuthology.log">http://qa-proxy.ceph.com/teuthology/teuthology-2016-12-11_02:01:21-rbd-master-distro-basic-smithi/624047/teuthology.log</a></p>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=82929
2016-12-12T15:47:46Z
Jason Dillaman
dillaman@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>12</i></li><li><strong>Assignee</strong> deleted (<del><i>Jason Dillaman</i></del>)</li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul><p>OSD logs of interest: <a class="external" href="http://qa-proxy.ceph.com/teuthology/teuthology-2016-12-11_02:01:21-rbd-master-distro-basic-smithi/624047/remote/smithi089/log/ceph-osd.1.log.gz">http://qa-proxy.ceph.com/teuthology/teuthology-2016-12-11_02:01:21-rbd-master-distro-basic-smithi/624047/remote/smithi089/log/ceph-osd.1.log.gz</a></p>
<p>The error is that 'image50' is created and the test fails when the image open fails with -ENOENT (even though the open request is right after the successful create request).</p>
<p><strong>Assumptions</strong>: pg11.5 is the cache tier pg for object rbd_id.image50 and pg2.5 is the backing pg for object rbd_id.image50. This file is the mapping between the v2 image name and its unique id.</p>
<p><strong>Step 1</strong>) librbd attempts to create image50. first step is to stat the image id object (rbd_id.image50) to ensure the image doesn't already exist:</p>
<pre>
...
2016-12-11 19:10:19.703177 7ff5bb2b0700 20 osd.1 pg_epoch: 70 pg[11.5( empty local-les=64 n=0 ec=63 les/c/f 64/64/0 63/63/63) [1,2] r=0 lpr=63 crt=0'0 mlcod 0'0 active+clean] do_op: op osd_op(client.4119.0:545 11.2f844175 rbd_id.image50 [stat] snapc 0=[] ack+read+known_if_redirected e70) v7
...
2016-12-11 19:10:19.703417 7ff5bb2b0700 10 osd.1 pg_epoch: 70 pg[11.5( empty local-les=64 n=0 ec=63 les/c/f 64/64/0 63/63/63) [1,2] r=0 lpr=63 crt=0'0 mlcod 0'0 active+clean] get_object_context: no obc for soid 11:ae8221f4:::rbd_id.image50:head but can_create
...
2016-12-11 19:10:19.703520 7ff5bb2b0700 10 osd.1 pg_epoch: 70 pg[11.5( empty local-les=64 n=0 ec=63 les/c/f 64/64/0 63/63/63) [1,2] r=0 lpr=63 crt=0'0 mlcod 0'0 active+clean] start_copy 11:ae8221f4:::rbd_id.image50:head from 11:ae8221f4:::rbd_id.image50:head @2 v0 flags 30 mirror_snapset
2016-12-11 19:10:19.703533 7ff5bb2b0700 10 osd.1 pg_epoch: 70 pg[11.5( empty local-les=64 n=0 ec=63 les/c/f 64/64/0 63/63/63) [1,2] r=0 lpr=63 crt=0'0 mlcod 0'0 active+clean] _copy_some 0x560246ec3c00 0x560246600410
...
2016-12-11 19:10:19.703961 7ff5bdab5700 20 osd.1 pg_epoch: 70 pg[2.5( v 52'29 (0'0,52'29] local-les=13 n=11 ec=12 les/c/f 13/13/0 12/12/12) [1,4] r=0 lpr=12 crt=52'29 lcod 45'28 mlcod 45'28 active+clean] do_op: op osd_op(osd.1.5:3 2.2f844175 rbd_id.image50 [list-snaps] snapc 0=[] ack+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e70) v7
...
2016-12-11 19:10:19.704234 7ff5bdab5700 1 -- 172.21.15.89:6804/6151 --> 172.21.15.89:0/6151 -- osd_op_reply(3 rbd_id.image50 [list-snaps] v0'0 uv0 ack = -2 ((2) No such file or directory)) v7 -- 0x560246e35a00 con 0
...
2016-12-11 19:10:19.704361 7ff5bb2b0700 20 osd.1 pg_epoch: 70 pg[2.5( v 52'29 (0'0,52'29] local-les=13 n=11 ec=12 les/c/f 13/13/0 12/12/12) [1,4] r=0 lpr=12 crt=52'29 lcod 45'28 mlcod 45'28 active+clean] do_op: op osd_op(osd.1.5:4 2.2f844175 rbd_id.image50 [copy-get max 8388608] snapc 0=[] ack+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e70) v7
...
2016-12-11 19:10:19.704563 7ff5bb2b0700 1 -- 172.21.15.89:6804/6151 --> 172.21.15.89:0/6151 -- osd_op_reply(4 rbd_id.image50 [copy-get max 8388608] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v7 -- 0x5602466fed80 con 0
...
2016-12-11 19:10:19.704771 7ff5b6aa7700 10 osd.1 pg_epoch: 70 pg[11.5( empty local-les=64 n=0 ec=63 les/c/f 64/64/0 63/63/63) [1,2] r=0 lpr=63 crt=0'0 mlcod 0'0 active+clean] process_copy_chunk 11:ae8221f4:::rbd_id.image50:head tid 4 (2) No such file or directory
...
2016-12-11 19:10:19.706745 7ff5bdab5700 25 osd.1 pg_epoch: 70 pg[11.5( v 70'1 (0'0,70'1] local-les=64 n=1 ec=63 les/c/f 64/64/0 63/63/63) [1,2] r=0 lpr=63 crt=70'1 lcod 0'0 mlcod 0'0 active+clean] do_op oi 11:ae8221f4:::rbd_id.image50:head(70'1 osd.1.0:213 whiteout s 0 uv 0 alloc_hint [0 0 0])
...
2016-12-11 19:10:19.706778 7ff5bdab5700 1 -- 172.21.15.89:6804/6151 --> 172.21.15.85:0/2199069473 -- osd_op_reply(545 rbd_id.image50 [stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v7 -- 0x560246afa080 con 0
...
</pre>
<p><strong>Step 1 result</strong>) librbd receives correct -2 error for non-existent object stat call. <br /><strong>Step 2</strong>) librbd proceeds to create rbd_id.image50 object:</p>
<pre>
...
2016-12-11 19:10:19.708777 7ff5bb2b0700 20 osd.1 pg_epoch: 70 pg[11.5( v 70'1 (0'0,70'1] local-les=64 n=1 ec=63 les/c/f 64/64/0 63/63/63) [1,2] r=0 lpr=63 crt=70'1 lcod 0'0 mlcod 0'0 active+clean] do_op: op osd_op(client.4119.0:547 11.2f844175 rbd_id.image50 [create 0~0 [excl],call rbd.set_id] snapc 0=[] ack+ondisk+write+known_if_redirected e70) v7
...
2016-12-11 19:10:19.709070 7ff5bb2b0700 10 osd.1 pg_epoch: 70 pg[11.5( v 70'1 (0'0,70'1] local-les=64 n=1 ec=63 les/c/f 64/64/0 63/63/63) [1,2] r=0 lpr=63 crt=70'1 lcod 0'0 mlcod 0'0 active+clean] maybe_create_new_object clearing whiteout on 11:ae8221f4:::rbd_id.image50:head
...
2016-12-11 19:10:19.710626 7ff5bdab5700 1 -- 172.21.15.89:6804/6151 --> 172.21.15.85:0/2199069473 -- osd_op_reply(547 rbd_id.image50 [create 0~0 [excl],call rbd.set_id] v70'2 uv2 ack = 0) v7 -- 0x560246af8d00 con 0
...
</pre>
<p><strong>Step 2 result</strong>) librbd receives ACK that the object create + rbd.set_id cls method succeeded.</p>
<p><strong>Note</strong>: rbd_id.image50 object appears to be in the cache tier right before the next step</p>
<pre>
...
2016-12-11 19:10:19.730055 7ff5beab7700 10 osd.1 pg_epoch: 71 pg[11.5( v 70'2 (0'0,70'2] local-les=64 n=1 ec=63 les/c/f 64/64/0 63/63/63) [1,2] r=0 lpr=63 crt=70'2 lcod 70'1 mlcod 70'1 active+clean] agent_choose_mode flush_mode: idle evict_mode: idle num_objects: 1 num_bytes: 16 num_objects_dirty: 1 num_objects_omap: 0 num_dirty: 1 num_user_objects: 1 num_user_bytes: 2064 num_overhead_bytes: 2048 pool.info.target_max_bytes: 0 pool.info.target_max_objects: 0
...
</pre>
<p><strong>Step 3</strong>) librbd next tries to open the newly created image by first stat-ing the rbd_id.image50 object to see if this is a v2 rbd image. However, it appears that the request is (incorrectly) send to the base tier (pg2.5) instead of the cache tier (pg 11.5).</p>
<pre>
...
2016-12-11 19:10:19.768580 7ff5bb2b0700 20 osd.1 pg_epoch: 72 pg[2.5( v 52'29 (0'0,52'29] local-les=13 n=11 ec=12 les/c/f 13/13/0 12/12/12) [1,4] r=0 lpr=12 crt=52'29 lcod 45'28 mlcod 45'28 active+clean] do_op: op osd_op(client.4119.0:551 2.2f844175 rbd_id.image50 [stat] snapc 0=[] ack+read+known_if_redirected e72) v7
...
2016-12-11 19:10:19.768633 7ff5bb2b0700 10 osd.1 pg_epoch: 72 pg[2.5( v 52'29 (0'0,52'29] local-les=13 n=11 ec=12 les/c/f 13/13/0 12/12/12) [1,4] r=0 lpr=12 crt=52'29 lcod 45'28 mlcod 45'28 active+clean] get_object_context: obc NOT found in cache: 2:ae8221f4:::rbd_id.image50:head
2016-12-11 19:10:19.768646 7ff5bb2b0700 15 filestore(/var/lib/ceph/osd/ceph-1) getattr 2.5_head/#2:ae8221f4:::rbd_id.image50:head# '_'
2016-12-11 19:10:19.768679 7ff5bb2b0700 10 filestore(/var/lib/ceph/osd/ceph-1) error opening file /var/lib/ceph/osd/ceph-1/current/2.5_head/rbd\uid.image50__head_2F844175__2 with flags=2: (2) No such file or directory
2016-12-11 19:10:19.768689 7ff5bb2b0700 10 filestore(/var/lib/ceph/osd/ceph-1) getattr 2.5_head/#2:ae8221f4:::rbd_id.image50:head# '_' = -2
2016-12-11 19:10:19.768711 7ff5bb2b0700 1 -- 172.21.15.89:6804/6151 --> 172.21.15.85:0/2199069473 -- osd_op_reply(551 rbd_id.image50 [stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v7 -- 0x5602466ff0c0 con 0
...
</pre>
<p><strong>Step 3 result</strong>) The stat call fails w/ -2 and librbd will fail to open the image since it cannot find the image name -> image id mapping file.</p>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=82930
2016-12-12T15:47:56Z
Jason Dillaman
dillaman@redhat.com
<ul><li><strong>Project</strong> changed from <i>rbd</i> to <i>Ceph</i></li></ul>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=89059
2017-04-12T17:00:01Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>Can't reproduce</i></li></ul>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=89061
2017-04-12T17:02:10Z
Sage Weil
sage@newdream.net
<ul></ul><p>please reopen if this happens again</p>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=101343
2017-10-25T18:35:07Z
Jason Dillaman
dillaman@redhat.com
<ul><li><strong>Status</strong> changed from <i>Can't reproduce</i> to <i>12</i></li></ul><p>Recent repeat failure: <a class="external" href="http://qa-proxy.ceph.com/teuthology/abhi-2017-10-21_15:51:20-rbd-wip-luminous-backports-distro-basic-smithi/1758354/teuthology.log">http://qa-proxy.ceph.com/teuthology/abhi-2017-10-21_15:51:20-rbd-wip-luminous-backports-distro-basic-smithi/1758354/teuthology.log</a></p>
<pre>
2017-10-21 19:07:29.832119 7f8759ced700 1 -- 172.21.15.117:6800/9349 --> 172.21.15.13:0/412289887 -- osd_op_reply(1184 rbd_id.image54 [stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 -- 0x55780e346380 con 0
2017-10-21 19:07:29.837748 7f875dcf5700 1 -- 172.21.15.117:6800/9349 --> 172.21.15.13:0/412289887 -- osd_op_reply(1187 rbd_id.image54 [create,call rbd.set_id] v89'8 uv36 ondisk = 0) v8 -- 0x55780e346700 con 0
2017-10-21 19:07:42.271610 7fbdb7fc2700 1 -- 172.21.15.18:6809/5996 --> 172.21.15.13:0/412289887 -- osd_op_reply(1190 rbd_id.image54 [stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 -- 0x5600d018d800 con 0
</pre>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=107599
2018-02-19T14:52:56Z
Jason Dillaman
dillaman@redhat.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>Jason Dillaman</i></li><li><strong>Priority</strong> changed from <i>High</i> to <i>Normal</i></li><li><strong>Backport</strong> set to <i>luminous,jewel</i></li></ul><p>This appears to be a race condition between the tear down of the previous test case and this test case. The previous test case creates a cache tier and destroys it. The race occurs when the deletion of the cache tier slowly propagates through the OSDs. Going to use a unique pool for the one cache tier test case to avoid the ripple race condition effect.</p>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=107604
2018-02-19T14:56:33Z
Jason Dillaman
dillaman@redhat.com
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>rbd</i></li><li><strong>ceph-qa-suite</strong> deleted (<del><i>rbd</i></del>)</li></ul>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=107606
2018-02-19T15:26:20Z
Jason Dillaman
dillaman@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li></ul><p><strong>PR</strong>: <a class="external" href="https://github.com/ceph/ceph/pull/20486">https://github.com/ceph/ceph/pull/20486</a></p>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=107682
2018-02-20T12:36:04Z
Mykola Golub
mgolub@suse.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li></ul>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=107747
2018-02-21T11:11:23Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/23064">Backport #23064</a>: luminous: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run</i> added</li></ul>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=107749
2018-02-21T11:11:25Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/23065">Backport #23065</a>: jewel: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run</i> added</li></ul>
rbd - Bug #11502: "[ FAILED ] TestLibRBD.LockingPP" in rbd-next-distro-basic-mult run
https://tracker.ceph.com/issues/11502?journal_id=109953
2018-03-30T20:35:05Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>