Bug #55241
openrbd export complains about not existing snapshot and fails
0%
Description
May be related to #18367 (which was closed due to lack of feedback), but not completly the same.
This fails:
# rbd --cluster ceph --pool triple export --export-format 2 vm-7010-disk-1 - > /dev/null error setting snapshot context: (2) No such file or directory Exporting image: 0% complete...failed. rbd: export error: (2) No such file or directory
No snapshots:
# rbd -p triple snap ls vm-7010-disk-1 #
But object-map is in ruins (which should not matter but maybe it does), also snapshot count is suspiciously one:
# rbd -p triple info vm-7010-disk-1 rbd image 'vm-7010-disk-1': size 400 GiB in 102400 objects order 22 (4 MiB objects) snapshot_count: 1 id: b149126b8b4567 block_name_prefix: rbd_data.b149126b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, operations op_features: snap-trash flags: object map invalid, fast diff invalid create_timestamp: Wed Oct 16 14:47:11 2019
The beliefs of rados are:
# rados -p triple listomapvals rbd_header.b149126b8b4567 | more create_timestamp value (8 bytes) : 00000000 4f 11 a7 5d f9 2f bf 00 |O..]./..| 00000008 features value (8 bytes) : 00000000 1d 01 00 00 00 00 00 00 |........| 00000008 flags value (8 bytes) : 00000000 03 00 00 00 00 00 00 00 |........| 00000008 object_prefix value (27 bytes) : 00000000 17 00 00 00 72 62 64 5f 64 61 74 61 2e 62 31 34 |....rbd_data.b14| 00000010 39 31 32 36 62 38 62 34 35 36 37 |9126b8b4567| 0000001b op_features value (8 bytes) : 00000000 08 00 00 00 00 00 00 00 |........| 00000008 order value (1 bytes) : 00000000 16 |.| 00000001 size value (8 bytes) : 00000000 00 00 00 00 64 00 00 00 |....d...| 00000008 snap_seq value (8 bytes) : 00000000 99 da 01 00 00 00 00 00 |........| 00000008 snapshot_000000000001bca5 value (108 bytes) : 00000000 08 08 66 00 00 00 a5 bc 01 00 00 00 00 00 24 00 |..f...........$.| 00000010 00 00 65 61 33 34 39 32 65 62 2d 30 66 38 65 2d |..ea3492eb-0f8e-| 00000020 34 63 33 65 2d 38 61 33 66 2d 64 37 35 63 30 63 |4c3e-8a3f-d75c0c| 00000030 30 39 31 62 62 63 00 00 00 00 64 00 00 00 00 03 |091bbc....d.....| 00000040 00 00 00 00 00 00 00 01 01 12 00 00 00 02 00 00 |................| 00000050 00 06 00 00 00 76 7a 64 75 6d 70 00 00 00 00 d9 |.....vzdump.....| 00000060 2d 1f 62 7c 38 f7 13 00 00 00 00 00 |-.b|8.......| 0000006c
Unfortunately rebuilding object-map on a running vm isn't quote possible (mounted by krbd and updated, so rebuild loses lock faster than it could move on).
# rbd -v ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)
Updated by Ilya Dryomov about 2 years ago
Hi Peter,
What is the output of "rbd snap ls --all"? It looks like the snapshot in question is in the RBD trash bin.
Updated by Peter Gervai about 2 years ago
Ilya Dryomov wrote:
What is the output of "rbd snap ls --all"? It looks like the snapshot in question is in the RBD trash bin.
Indeed:
# rbd --cluster ceph --pool triple snap ls --all vm-7010-disk-1 SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 113829 ea3492eb-0f8e-4c3e-8a3f-d75c0c091bbc 400 GiB Wed Mar 2 09:42:01 2022 trash (vzdump)
I'm not sure what the moral of the story is; probably that there ought to be some suggestion about possible directions to look at. I wasn't even aware that snapshots can be separately trashed, and it was probably created by a runaway background process.
Probably the message of export could describe the [possible] problem in a little more detail.
For this specific case I thank you for your help!
Updated by Peter Gervai about 2 years ago
Tried to check to follow on a mail (https://www.mail-archive.com/ceph-users@lists.ceph.com/msg53551.html) which seemed to be related, since I wanted to figure out how to get rid of that snapshot, but got this:
# rbd -p triple children --snap-id 113829 vm-7010-disk-1 2022-04-08T22:56:34.896+0200 7f3a35667700 -1 librbd::object_map::RefreshRequest: failed to load object map: rbd_object_map.b149126b8b4567.000000000001bca5 2022-04-08T22:56:34.908+0200 7f3a35667700 -1 librbd::object_map::InvalidateRequest: 0x7f3a18015280 should_complete: r=0
Updated by Ilya Dryomov about 2 years ago
The clone of that snapshot might itself be in the trash bin. What is the output of "rbd -p triple children vm-7010-disk-1 --all --descendants"? What is the output of "rbd -p triple trash ls --all --long"?
Updated by Peter Gervai about 2 years ago
- rbd -p triple children vm-7010-disk-1 --all --descendants
(empty)
- rbd -p triple trash ls --all --long
ID NAME SOURCE DELETED_AT STATUS PARENT
025cdf331ef784 vm-2003-disk-0 USER Fri Apr 8 12:02:44 2022 expired at Fri Apr 8 12:02:44 2022
(unrelated)
Updated by Ilya Dryomov about 2 years ago
You should be able to get rid of that snapshot with "rbd -p triple snap rm --snap-id 113829 vm-7010-disk-1".
Updated by Peter Gervai about 2 years ago
Yes, in the meantime I have found another similar image to test on (wanted to keep this if you happened to want any further info), and the by-id removal worked fine, and I'm sure it will work on this one, too.
I guess rbd
could be a bit more helpful here as it seems to be a bit hard to see why it doesn't work.
Also I asked around the irc and nobody seemed to know what one's supposed to do with trashed images with snapshots, since (according to rbd doc) they cannot be removed or purged, so it's not clear what's the use of trashing non-removeable objects. (I can, and indeed should, restore them, and that seems to be all.)
For me this is resolved, for now, thank you; I am not sure people could find this solution without asking for help of someone who already familar with the signs.
Updated by Ilya Dryomov about 2 years ago
Peter Gervai wrote:
Also I asked around the irc and nobody seemed to know what one's supposed to do with trashed images with snapshots, since (according to rbd doc) they cannot be removed or purged, so it's not clear what's the use of trashing non-removeable objects. (I can, and indeed should, restore them, and that seems to be all.)
This is not a trashed image with snapshots, rather this is an image with a trashed snapshot. A trashed snapshot on an otherwise "normal" image occurs when one attempts to remove a snapshot that still has clones that depend on it. In that case "rbd snap rm" moves that snapshot to the trash bin, pending either the flatten or the removal of all clones that are based off of that snapshot. The snapshot is supposed to be removed automatically when that happens, but it seems like that did not happen here for some reason.
For me this is resolved, for now, thank you; I am not sure people could find this solution without asking for help of someone who already familar with the signs.
You mentioned a runaway background process earlier. Could you share some details on the automation you are using to create snapshots/clones and to remove them?
How many images in this state did you have? Was there something in common between them?
Do you recall if there were clone(s) based off of "vzdump" snapshot -- when were they created, how/when where they removed, etc?
Updated by Ilya Dryomov almost 2 years ago
- Status changed from New to Need More Info