Bug #56181
closedmirror snapshot syncing progress reporting (last_copied_object_number) is broken for sparse images
0%
Description
$ rbd create --size 2G data/testimg $ rbd mirror image enable data/testimg snapshot Mirroring enabled $ sudo rbd device map data/testimg $ dd if=/dev/urandom of=/dev/rbd0 bs=4M oflag=direct dd: error writing '/dev/rbd0': No space left on device 513+0 records in 512+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 37.0197 s, 58.0 MB/s $ sudo rbd device unmap data/testimg $ rbd mirror image snapshot data/testimg Snapshot ID: 8
Monitor last_copied_object_number and complete fields on the secondary cluster:
$ rbd snap ls --all --format=json data/testimg | jq '.[-1] | .namespace.last_copied_object_number,.namespace.complete'
For a fully allocated image, last_copied_object_number is gradually increasing from 1 to 512 (image size / object size). The snapshot is marked complete when last_copied_object_number reaches 512, as expected:
1 false --- 19 false --- 42 false --- 50 false --- 72 false --- 92 false --- 107 false --- 124 false --- 142 false --- 161 false --- 181 false --- 200 false --- 214 false --- 235 false --- 255 false --- 270 false --- 288 false --- 305 false --- 320 false --- 338 false --- 355 false --- 370 false --- 386 false --- 402 false --- 418 false --- 432 false --- 451 false --- 465 false --- 484 false --- 502 false --- 512 true --- 512 true --- 512 true
For an image with holes in it, last_copied_object_number gets tripped over the first hole. The sync continues but last_copied_object_number isn't updated. Eventually the snapshot is marked complete with last_copied_object_number still stuck:
$ rbd create --size 2G data/testimg $ rbd mirror image enable data/testimg snapshot Mirroring enabled $ sudo rbd device map data/testimg $ dd if=/dev/urandom of=/dev/rbd0 bs=4M count=100 oflag=direct 100+0 records in 100+0 records out 419430400 bytes (419 MB, 400 MiB) copied, 15.9774 s, 26.3 MB/s $ dd if=/dev/urandom of=/dev/rbd0 bs=4M seek=110 oflag=direct dd: error writing '/dev/rbd0': No space left on device 403+0 records in 402+0 records out 1686110208 bytes (1.7 GB, 1.6 GiB) copied, 65.5053 s, 25.7 MB/s $ sudo rbd device unmap data/testimg $ rbd mirror image snapshot data/testimg Snapshot ID: 16
1 false --- 5 false --- 29 false --- 40 false --- 51 false --- 64 false --- 73 false --- 85 false --- 87 false --- 99 false --- 99 false --- 99 false --- [...] --- 99 false --- 99 false --- 99 false --- 100 true --- 100 true --- 100 true
This can also be observed by monitoring syncing_percent field in "rbd mirror image status" output.
Updated by Ilya Dryomov almost 2 years ago
- Backport set to octopus,pacific,quincy
- Pull request ID set to 46858
Updated by Ilya Dryomov almost 2 years ago
- Status changed from In Progress to Fix Under Review
Updated by Ilya Dryomov almost 2 years ago
- Regression changed from No to Yes
This is a regression, introduced in octopus in https://github.com/ceph/ceph/commit/e5a21e904142f8d728ba5de1eecf940b6b8329b5.
Updated by Ilya Dryomov almost 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56430: quincy: mirror snapshot syncing progress reporting (last_copied_object_number) is broken for sparse images added
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56431: octopus: mirror snapshot syncing progress reporting (last_copied_object_number) is broken for sparse images added
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56432: pacific: mirror snapshot syncing progress reporting (last_copied_object_number) is broken for sparse images added
Updated by Ilya Dryomov almost 2 years ago
- Status changed from Pending Backport to Resolved