Bug #18982
closedHow to get out of weird situation after rbd flatten?
0%
Description
Hope this is good for the tracker instead of the mailing list...
We have an image that was cloned from a snapshot:
rbd/foo (parent: rbd/foo-original@after-fixup)
We wanted to clean this up so I ran:
rbd flatten rbd/foo
This took a while as the volume is around 8TiB. At this point in time the volume rbd/foo did have snapshots.
I tried to unprotect the source of the clone, but that didn't work:
rbd snap unprotect rbd/foo-original@after-fixup 2017-02-17 23:14:55.009408 7fcea66087c0 -1 librbd: snap_unprotect: can't unprotect; at least 1 child(ren) in pool rbd rbd: unprotecting snap failed: (16) Device or resource busy
After that I learned about the "deep flatten" feature. As this wasn't available any longer, I decided to delete the snapshots on rbd/foo. That didn't help either.
The weird thing now is that the metadata structures seem to have become corrupt:
# rbd children rbd/foo-original@after-fixup rbd/foo # rbd flatten rbd/foo rbd flatten rbd/foo Image flatten: 0% complete...failed. rbd: flatten error: (22) Invalid argument 2017-02-17 23:19:24.903319 7efd554117c0 -1 librbd: image has no parent
Also, maybe of interest, the source of the clone (rbd/foo-original@after-fixup) has a parent, too.
I'm now keeping duplicate data and can't delete the superfluous copy. This sounds like at least there should be some workaround ... Help. :(
Updated by Shinobu Kinjo about 7 years ago
- Assignee set to Shinobu Kinjo
Please write Ceph and Kernel versions your cluster running.
Updated by Christian Theune about 7 years ago
The affected Ceph version as assigned to the ticket: 0.94.7. Kernel (on Ceph hosts) is 4.4.27 (soon to be updated to a recent 4.9 version).
Updated by Christian Theune about 7 years ago
We managed to work around this issue by manually editing the rbd metadata objects. You can close this if you like.
Updated by Jason Dillaman almost 7 years ago
- Project changed from Ceph to rbd
- Category deleted (
librbd)
Updated by Jason Dillaman almost 7 years ago
- Status changed from New to Duplicate
Seems like it's a duplicate of issue #18117
Updated by Christian Theune over 5 years ago
FYI, this also just happened on a cluster that I upgraded to Jewel. No need to debug at this point in time as I couldn't replicate it and was in a larger move to clean up a pool (the snapshot/clone was done over two pools: one had the the protected snapshot, the other the clone). I had to delete the old pool w/ the snapshots anyway so I got myself out of that situation by removing the pool.
We'll upgrade to luminous soon, maybe I this even happens there again ... also, I'm not sure this is a duplicate of #18117 as that bug would allow me to delete the snapshot, but that's not the case ...
Updated by Christian Theune over 1 year ago
And a quick note: upgrading to Luminous happened in between, however, we're currently upgrading to Nautilus and are now encountering a very similar situation. _