Project

General

Profile

Bug #18982

How to get out of weird situation after rbd flatten?

Added by Christian Theune about 7 years ago. Updated about 1 year ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Shinobu Kinjo
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hope this is good for the tracker instead of the mailing list...

We have an image that was cloned from a snapshot:

rbd/foo (parent: rbd/foo-original@after-fixup)

We wanted to clean this up so I ran:

rbd flatten rbd/foo

This took a while as the volume is around 8TiB. At this point in time the volume rbd/foo did have snapshots.

I tried to unprotect the source of the clone, but that didn't work:

rbd snap unprotect rbd/foo-original@after-fixup
2017-02-17 23:14:55.009408 7fcea66087c0 -1 librbd: snap_unprotect: can't unprotect; at least 1 child(ren) in pool rbd
rbd: unprotecting snap failed: (16) Device or resource busy

After that I learned about the "deep flatten" feature. As this wasn't available any longer, I decided to delete the snapshots on rbd/foo. That didn't help either.

The weird thing now is that the metadata structures seem to have become corrupt:

# rbd children rbd/foo-original@after-fixup
rbd/foo

# rbd flatten rbd/foo
rbd flatten rbd/foo
Image flatten: 0% complete...failed.
rbd: flatten error: (22) Invalid argument
2017-02-17 23:19:24.903319 7efd554117c0 -1 librbd: image has no parent

Also, maybe of interest, the source of the clone (rbd/foo-original@after-fixup) has a parent, too.

I'm now keeping duplicate data and can't delete the superfluous copy. This sounds like at least there should be some workaround ... Help. :(

History

#1 Updated by Shinobu Kinjo about 7 years ago

  • Assignee set to Shinobu Kinjo

Please write Ceph and Kernel versions your cluster running.

#2 Updated by Christian Theune about 7 years ago

The affected Ceph version as assigned to the ticket: 0.94.7. Kernel (on Ceph hosts) is 4.4.27 (soon to be updated to a recent 4.9 version).

#3 Updated by Christian Theune almost 7 years ago

We managed to work around this issue by manually editing the rbd metadata objects. You can close this if you like.

#4 Updated by Jason Dillaman over 6 years ago

  • Project changed from Ceph to rbd
  • Category deleted (librbd)

#5 Updated by Jason Dillaman over 6 years ago

  • Status changed from New to Duplicate

Seems like it's a duplicate of issue #18117

#6 Updated by Christian Theune about 5 years ago

FYI, this also just happened on a cluster that I upgraded to Jewel. No need to debug at this point in time as I couldn't replicate it and was in a larger move to clean up a pool (the snapshot/clone was done over two pools: one had the the protected snapshot, the other the clone). I had to delete the old pool w/ the snapshots anyway so I got myself out of that situation by removing the pool.

We'll upgrade to luminous soon, maybe I this even happens there again ... also, I'm not sure this is a duplicate of #18117 as that bug would allow me to delete the snapshot, but that's not the case ...

#7 Updated by Christian Theune about 1 year ago

And a quick note: upgrading to Luminous happened in between, however, we're currently upgrading to Nautilus and are now encountering a very similar situation. _

Also available in: Atom PDF