Project

General

Profile

Bug #45986

importing rbd diff does not apply zero sequences correctly

Added by Nikola Ciprich 6 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus,octopus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature:

Description

while investigating problems with incorrect backup checkums, we've discovered there's probably bug in rbd import command. When applying second rbd diff containing sequences of zeros, those seem not to be correctly applied, and areas which should be zeroed still contain data..

I'm attaching test data.

steps to reproduce:

  1. create new image and import diffs
    rbd create sata/test-export-import --size 100M
    rbd import-diff --no-progress export-first.diff sata/test-export-import
    rbd import-diff --no-progress export-second.diff sata/test-export-import
  1. export raw content of second snapshot
    rbd export --export-format 1 --no-progress sata/test-export-import@second export-second-raw.txt
  1. show structure of second diff
    python diff-parser.py export-second.diff

    banner: rbd diff v1
    export_format: 1
    from: first
    snap: second
    size: 104857600
    zero: offset: 230419, length: 12149
    zero: offset: 8379678, length: 8930
    data: offset: 8388608, length: 4194304
    data: offset: 12582912, length: 457699
    data: offset: 19155848, length: 1815672
    data: offset: 20971520, length: 4194304
    zero: offset: 25165824, length: 952115
    data: offset: 26561985, length: 2798143
    data: offset: 29360128, length: 4194304
    data: offset: 33554432, length: 2919371
    zero: offset: 40799568, length: 1143472
    zero: offset: 41943040, length: 565422
    data: offset: 43129817, length: 202932
    data: offset: 43575935, length: 1220628
    zero: offset: 45694610, length: 442734
    data: offset: 46137344, length: 4194304
    data: offset: 50331648, length: 2169674
    end
    eof

  1. expected area of zeros !!!
    hexdump -s 230419 -n 80 export-second-raw.txt

    0038413 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
    0038423 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
    0038433 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
    0038443 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
    0038453 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f

  1. expected area of zeros !!!
    hexdump -s 8379678 -n 80 export-second-raw.txt

    07fdd1e 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
    07fdd2e 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
    07fdd3e 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
    07fdd4e 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
    07fdd5e 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66

  1. this part of dump is correct
    hexdump -s 25165824 -n 80 export-second-raw.txt

    1800000 0000 0000 0000 0000 0000 0000 0000 0000 *
    1800050

  1. expected area of zeros !!!
    hexdump -s 40799568 -n 80 export-second-raw.txt

    26e8d50 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
    26e8d60 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
    26e8d70 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
    26e8d80 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
    26e8d90 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66

  1. this part of dump is correct
    hexdump -s 41943040 -n 80 export-second-raw.txt

    2800000 0000 0000 0000 0000 0000 0000 0000 0000 *
    2800050

  1. expected area of zeros !!!
    hexdump -s 45694610 -n 80 export-second-raw.txt

    2b93e92 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
    2b93ea2 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f
    2b93eb2 6f6f 6f66 666f 6f6f 6f66 666f 6f6f 6f66
    2b93ec2 666f 6f6f 6f66 666f 6f6f 6f66 666f 6f6f
    2b93ed2 6f66 666f 6f6f 6f66 666f 6f6f 6f66 666f

we've confirmed this issue on all latest versions (13, 14, 15).

ceph-import-diff-issue-reproducer.tar.gz (132 KB) Nikola Ciprich, 06/12/2020 07:43 AM

rbd.log.gz (8.34 KB) Nikola Ciprich, 06/18/2020 05:53 AM


Related issues

Copied to rbd - Backport #46673: nautilus: importing rbd diff does not apply zero sequences correctly Resolved
Copied to rbd - Backport #46674: octopus: importing rbd diff does not apply zero sequences correctly Resolved

History

#1 Updated by Neha Ojha 5 months ago

  • Project changed from RADOS to rbd

Jason, can you please take a look to see if this bug is in the RBD layer.

#2 Updated by Jason Dillaman 5 months ago

  • Status changed from New to Need More Info

If it's only affecting Nautilus and later releases, I would suspect it's related to the "rbd_skip_partial_discard" setting since it switched to "true" at that point. Can you please provide a RBD debug log dump from the second snapshot import step?

$ rbd import-diff --debug-rbd=20 --no-progress export-second.diff sata/test-export-import 2 > rbd.log

#3 Updated by Nikola Ciprich 5 months ago

sure thing, attaching the log. n.

#4 Updated by Nikola Ciprich 5 months ago

one more update, after disabling rbd_skip_partial_discard (and restarting OSDs) I'm still able to reproduce the problem.. n.

#5 Updated by Jason Dillaman 5 months ago

Nikola Ciprich wrote:

one more update, after disabling rbd_skip_partial_discard (and restarting OSDs) I'm still able to reproduce the problem.. n.

I can only reproduce it w/ the default "rbd_skip_partial_discard" option set to true -- when "false", it works correctly as expected:

rbd_skip_partial_discard = false:
$ rbd --rbd_concurrent_management_ops=1 --rbd_skip_partial_discard=false --debug-rbd=20 import-diff export-second.diff test-export-import 2>&1 | grep -E 'ObjectRequest.*(zero|remove|truncate)'
2020-06-18T10:50:41.392-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d800ab20 send: rbd_data.1032347771b.0000000000000000 zero 230419~12149
2020-06-18T10:50:41.392-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d80102a0 send: rbd_data.1032347771b.0000000000000001 truncate 4185374~8930
2020-06-18T10:50:41.399-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d800ff30 send: rbd_data.1032347771b.0000000000000002 zero 0~3002368
2020-06-18T10:50:41.409-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d800d340 send: rbd_data.1032347771b.0000000000000005 truncate 3547136~647168
2020-06-18T10:50:41.410-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d800dbb0 send: rbd_data.1032347771b.0000000000000006 zero 0~952115
2020-06-18T10:50:41.413-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d80105c0 send: rbd_data.1032347771b.0000000000000006 zero 1396161~671744
2020-06-18T10:50:41.422-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d8004a40 send: rbd_data.1032347771b.0000000000000007 zero 2453504~1077248
2020-06-18T10:50:41.425-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d80067b0 send: rbd_data.1032347771b.0000000000000009 truncate 3050832~1143472
2020-06-18T10:50:41.426-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d8007230 send: rbd_data.1032347771b.000000000000000a zero 0~565422
2020-06-18T10:50:41.427-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d8008ff0 send: rbd_data.1032347771b.000000000000000a truncate 3751570~442734
2020-06-18T10:50:41.509-0400 7f1404bcb700 20 librbd::io::ObjectRequest: 0x7f13d800ab20 send: rbd_data.1032347771b.000000000000000b zero 0~1437696
2020-06-18T10:50:41.524-0400 7f13ecff9700 20 librbd::io::ObjectRequest: 0x7f13d800e840 send: rbd_data.1032347771b.000000000000000c zero 2031616~138058

rbd_skip_partial_discard = true:
$ rbd --rbd_concurrent_management_ops=1 --rbd_skip_partial_discard=true --debug-rbd=20 import-diff export-second.diff test-export-import 2>&1 | grep -E 'ObjectRequest.*(zero|remove|truncate)'
2020-06-18T10:52:45.640-0400 7f8829ffb700 20 librbd::io::ObjectRequest: 0x7f8818006720 send: rbd_data.1032347771b.0000000000000006 zero 0~917504
2020-06-18T10:52:45.649-0400 7f8829ffb700 20 librbd::io::ObjectRequest: 0x7f8818021f90 send: rbd_data.1032347771b.0000000000000009 truncate 3080192~1114112
2020-06-18T10:52:45.650-0400 7f8829ffb700 20 librbd::io::ObjectRequest: 0x7f8818022820 send: rbd_data.1032347771b.000000000000000a zero 0~524288
2020-06-18T10:52:45.653-0400 7f8829ffb700 20 librbd::io::ObjectRequest: 0x7f8818002820 send: rbd_data.1032347771b.000000000000000a truncate 3801088~393216

#6 Updated by Nikola Ciprich 5 months ago

you're right, I was setting this using ceph config set osd ..., but this is apparently not the right way to do it.. when I put it to ceph.conf, [client] section, it works correctly.. So I confirm that setting rbd_skip_partial_discard to true causes reported issue.

#7 Updated by Jason Dillaman 5 months ago

  • Status changed from Need More Info to In Progress
  • Assignee set to Jason Dillaman

Awesome, thanks for verifying.

#8 Updated by Jason Dillaman 5 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 35855

#9 Updated by Mykola Golub 5 months ago

  • Status changed from Fix Under Review to Resolved

#10 Updated by Jason Dillaman 4 months ago

  • Status changed from Resolved to Pending Backport
  • Backport set to nautilus,octopus

#11 Updated by Jason Dillaman 4 months ago

  • Copied to Backport #46673: nautilus: importing rbd diff does not apply zero sequences correctly added

#12 Updated by Jason Dillaman 4 months ago

  • Copied to Backport #46674: octopus: importing rbd diff does not apply zero sequences correctly added

#13 Updated by Nathan Cutler 4 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF