Bug #45986
closedimporting rbd diff does not apply zero sequences correctly
0%
Description
while investigating problems with incorrect backup checkums, we've discovered there's probably bug in rbd import command. When applying second rbd diff containing sequences of zeros, those seem not to be correctly applied, and areas which should be zeroed still contain data..
I'm attaching test data.
steps to reproduce:
- create new image and import diffs
rbd create sata/test-export-import --size 100M
rbd import-diff --no-progress export-first.diff sata/test-export-import
rbd import-diff --no-progress export-second.diff sata/test-export-import
- export raw content of second snapshot
rbd export --export-format 1 --no-progress sata/test-export-import@second export-second-raw.txt
- show structure of second diff
python diff-parser.py export-second.diffbanner: rbd diff v1
export_format: 1
from: first
snap: second
size: 104857600
zero: offset: 230419, length: 12149
zero: offset: 8379678, length: 8930
data: offset: 8388608, length: 4194304
data: offset: 12582912, length: 457699
data: offset: 19155848, length: 1815672
data: offset: 20971520, length: 4194304
zero: offset: 25165824, length: 952115
data: offset: 26561985, length: 2798143
data: offset: 29360128, length: 4194304
data: offset: 33554432, length: 2919371
zero: offset: 40799568, length: 1143472
zero: offset: 41943040, length: 565422
data: offset: 43129817, length: 202932
data: offset: 43575935, length: 1220628
zero: offset: 45694610, length: 442734
data: offset: 46137344, length: 4194304
data: offset: 50331648, length: 2169674
end
eof
- expected area of zeros !!!
hexdump -s 230419 -n 80 export-second-raw.txt0038413 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
0038423 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
0038433 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
0038443 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
0038453 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
- expected area of zeros !!!
hexdump -s 8379678 -n 80 export-second-raw.txt07fdd1e 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
07fdd2e 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
07fdd3e 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
07fdd4e 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
07fdd5e 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
- this part of dump is correct
hexdump -s 25165824 -n 80 export-second-raw.txt1800000 0000 0000 0000 0000 0000 0000 0000 0000 *
1800050
- expected area of zeros !!!
hexdump -s 40799568 -n 80 export-second-raw.txt26e8d50 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
26e8d60 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
26e8d70 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
26e8d80 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
26e8d90 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
- this part of dump is correct
hexdump -s 41943040 -n 80 export-second-raw.txt2800000 0000 0000 0000 0000 0000 0000 0000 0000 *
2800050
- expected area of zeros !!!
hexdump -s 45694610 -n 80 export-second-raw.txt2b93e92 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
2b93ea2 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
2b93eb2 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
2b93ec2 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
2b93ed2 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
we've confirmed this issue on all latest versions (13, 14, 15).
Files
Updated by Neha Ojha almost 4 years ago
- Project changed from RADOS to rbd
Jason, can you please take a look to see if this bug is in the RBD layer.
Updated by Jason Dillaman almost 4 years ago
- Status changed from New to Need More Info
If it's only affecting Nautilus and later releases, I would suspect it's related to the "rbd_skip_partial_discard" setting since it switched to "true" at that point. Can you please provide a RBD debug log dump from the second snapshot import step?
$ rbd import-diff --debug-rbd=20 --no-progress export-second.diff sata/test-export-import 2 > rbd.log
Updated by Nikola Ciprich almost 4 years ago
- File rbd.log.gz rbd.log.gz added
sure thing, attaching the log. n.
Updated by Nikola Ciprich almost 4 years ago
one more update, after disabling rbd_skip_partial_discard (and restarting OSDs) I'm still able to reproduce the problem.. n.
Updated by Jason Dillaman almost 4 years ago
Nikola Ciprich wrote:
one more update, after disabling rbd_skip_partial_discard (and restarting OSDs) I'm still able to reproduce the problem.. n.
I can only reproduce it w/ the default "rbd_skip_partial_discard" option set to true -- when "false", it works correctly as expected:
rbd_skip_partial_discard = false:
$ rbd --rbd_concurrent_management_ops=1 --rbd_skip_partial_discard=false --debug-rbd=20 import-diff export-second.diff test-export-import 2>&1 | grep -E 'ObjectRequest.*(zero|remove|truncate)'
2020-06-18T10:50:41.392-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d800ab20 send: rbd_data.1032347771b.0000000000000000 zero 230419~12149
2020-06-18T10:50:41.392-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d80102a0 send: rbd_data.1032347771b.0000000000000001 truncate 4185374~8930
2020-06-18T10:50:41.399-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d800ff30 send: rbd_data.1032347771b.0000000000000002 zero 0~3002368
2020-06-18T10:50:41.409-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d800d340 send: rbd_data.1032347771b.0000000000000005 truncate 3547136~647168
2020-06-18T10:50:41.410-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d800dbb0 send: rbd_data.1032347771b.0000000000000006 zero 0~952115
2020-06-18T10:50:41.413-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d80105c0 send: rbd_data.1032347771b.0000000000000006 zero 1396161~671744
2020-06-18T10:50:41.422-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d8004a40 send: rbd_data.1032347771b.0000000000000007 zero 2453504~1077248
2020-06-18T10:50:41.425-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d80067b0 send: rbd_data.1032347771b.0000000000000009 truncate 3050832~1143472
2020-06-18T10:50:41.426-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d8007230 send: rbd_data.1032347771b.000000000000000a zero 0~565422
2020-06-18T10:50:41.427-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d8008ff0 send: rbd_data.1032347771b.000000000000000a truncate 3751570~442734
2020-06-18T10:50:41.509-0400 7f1404bcb700 20 librbd::io::ObjectRequest: 0x7f13d800ab20 send: rbd_data.1032347771b.000000000000000b zero 0~1437696
2020-06-18T10:50:41.524-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d800e840 send: rbd_data.1032347771b.000000000000000c zero 2031616~138058
rbd_skip_partial_discard = true:
$ rbd --rbd_concurrent_management_ops=1 --rbd_skip_partial_discard=true --debug-rbd=20 import-diff export-second.diff test-export-import 2>&1 | grep -E 'ObjectRequest.*(zero|remove|truncate)'
2020-06-18T10:52:45.640-0400 7f8829ffb700 20 librbd::io::ObjectRequest: 0x7f8818006720 send: rbd_data.1032347771b.0000000000000006 zero 0~917504
2020-06-18T10:52:45.649-0400 7f8829ffb700 20 librbd::io::ObjectRequest: 0x7f8818021f90 send: rbd_data.1032347771b.0000000000000009 truncate 3080192~1114112
2020-06-18T10:52:45.650-0400 7f8829ffb700 20 librbd::io::ObjectRequest: 0x7f8818022820 send: rbd_data.1032347771b.000000000000000a zero 0~524288
2020-06-18T10:52:45.653-0400 7f8829ffb700 20 librbd::io::ObjectRequest: 0x7f8818002820 send: rbd_data.1032347771b.000000000000000a truncate 3801088~393216
Updated by Nikola Ciprich almost 4 years ago
you're right, I was setting this using ceph config set osd ..., but this is apparently not the right way to do it.. when I put it to ceph.conf, [client] section, it works correctly.. So I confirm that setting rbd_skip_partial_discard to true causes reported issue.
Updated by Jason Dillaman almost 4 years ago
- Status changed from Need More Info to In Progress
- Assignee set to Jason Dillaman
Awesome, thanks for verifying.
Updated by Jason Dillaman almost 4 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 35855
Updated by Mykola Golub almost 4 years ago
- Status changed from Fix Under Review to Resolved
Updated by Jason Dillaman almost 4 years ago
- Status changed from Resolved to Pending Backport
- Backport set to nautilus,octopus
Updated by Jason Dillaman almost 4 years ago
- Copied to Backport #46673: nautilus: importing rbd diff does not apply zero sequences correctly added
Updated by Jason Dillaman almost 4 years ago
- Copied to Backport #46674: octopus: importing rbd diff does not apply zero sequences correctly added
Updated by Nathan Cutler almost 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".