https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2017-05-08T14:57:11Z
Ceph
rbd - Bug #19863: bluestore deferred write crc changes before write
https://tracker.ceph.com/issues/19863?journal_id=90864
2017-05-08T14:57:11Z
Sage Weil
sage@newdream.net
<ul></ul><p>/a/sage-2017-05-05_22:43:52-rbd-wip-sage-testing---basic-smithi/1105005</p>
rbd - Bug #19863: bluestore deferred write crc changes before write
https://tracker.ceph.com/issues/19863?journal_id=90868
2017-05-08T18:08:38Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p>both instances were rbd. trying to reproduce with some additional logging that hexdumps the buffers.</p>
rbd - Bug #19863: bluestore deferred write crc changes before write
https://tracker.ceph.com/issues/19863?journal_id=90878
2017-05-09T03:33:53Z
Sage Weil
sage@newdream.net
<ul></ul><p>it's an objectmap object. the block in question looks like this:<br /><pre>
00000000 0e 00 00 00 01 01 08 00 00 00 00 0a 00 00 00 00 |................|
00000010 00 00 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |..UUUUUUUUUUUUUU|
00000020 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000030 55 55 50 00 00 00 00 00 00 00 00 00 00 00 00 00 |UUP.............|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000000b0 00 00 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |..UUUUUUUUUUUUUU|
000000c0 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
000000d0 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
000000e0 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
000000f0 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000100 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000110 55 55 40 00 00 00 00 00 00 00 00 00 00 00 00 00 |UU@.............|
00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000150 00 00 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |..UUUUUUUUUUUUUU|
00000160 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000170 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000180 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000190 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
000001a0 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
000001b0 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
000001c0 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
000001d0 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
000001e0 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
000001f0 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000200 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000210 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000220 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000230 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000240 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000250 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000260 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 |UUUUUUUUUUUUUUUU|
00000270 55 55 55 55 55 55 55 55 55 55 00 00 00 00 00 00 |UUUUUUUUUU......|
00000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000290 00 01 0c 00 00 00 91 1e 9c 74 01 00 00 00 68 74 |.........t....ht|
000002a0 e6 37 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.7..............|
000002b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000ff0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001000
</pre><br />the diff before/after the corruption is<br /><pre>
< 00000270 55 55 55 55 55 55 55 55 55 54 00 00 00 00 00 00 |UUUUUUUUUT......|
---
> 00000270 55 55 55 55 55 55 55 55 55 55 00 00 00 00 00 00 |UUUUUUUUUU......|
</pre><br />My guess is cls_rbd has two ops in flight and both are writing out bufferlist that point to the same buffer in memory. The earlier write changes "in flight" after the second is prepared and we only notice because of the crc mismatch.</p>
<p>Assuming that's correct, the fix is just to copy the changed block(s) to a new buffer before writing them. (The buffer will stick around indefinitely in the bluestore buffer cache.)</p>
<p>Alternatively cls_rbd could try to get clever by doing cow internally based on the raw buffers ref count, but I'm not sure it's worth the complexity/risk!</p>
rbd - Bug #19863: bluestore deferred write crc changes before write
https://tracker.ceph.com/issues/19863?journal_id=90895
2017-05-09T15:31:23Z
Jason Dillaman
dillaman@redhat.com
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>rbd</i></li><li><strong>Assignee</strong> changed from <i>Sage Weil</i> to <i>Jason Dillaman</i></li></ul>
rbd - Bug #19863: bluestore deferred write crc changes before write
https://tracker.ceph.com/issues/19863?journal_id=90896
2017-05-09T15:42:41Z
Jason Dillaman
dillaman@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li><li><strong>Backport</strong> set to <i>kraken,jewel,hammer</i></li></ul><p><strong>PR</strong>: <a class="external" href="https://github.com/ceph/ceph/pull/15017">https://github.com/ceph/ceph/pull/15017</a></p>
rbd - Bug #19863: bluestore deferred write crc changes before write
https://tracker.ceph.com/issues/19863?journal_id=90973
2017-05-11T15:17:34Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul>