Project

General

Profile

Actions

Bug #54625

closed

Issue removing subvolume with retained snapshots - Possible quincy regression?

Added by John Mulligan about 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
% Done:

100%

Source:
Tags:
Backport:
quincy, pacific
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm hitting a situation with test code that occurs only on quincy at this time.
To summarize:
  • ceph fs subvolume create ...
  • ceph fs subvolume info ... # ^^^ verify that subvolume supports retention
  • ceph fs subvolume snapshot create ...
  • ceph fs subvolume rm ... # ^^^ this is expected to fail. and does!
  • ceph fs subvolume rm --retain_snapshots ... # ^^^ succeeds (subvol goes to "snapshot-retained" state)
  • ceph fs subvolume snapshot rm ...
  • ceph fs subvolume rm ...

Now I expect that last "subvolume rm" to pass as the retained snapshot has been deleted on the previous step.
However, on quincy, it fails with the error "clone in-progress -- please cancel the clone and retry"
But no clones are in progress as far as I can tell, as this procedure doesn't involve cloning at all.

I took a quick look at the code and I wonder if the function safe_to_remove_subvolume_clone simply isn't considering the "snapshot-retained" state. It checks for complete/canceled/failed and only accepts those. So my guess that this is an unintended regression. However, if this is expected behavior please point me in the right direction... I couldn't find anything relevant in the quincy docs branch.

FWIW this test that I'm running is part of go-ceph project and thus this could impact the ceph-csi project when used with ceph quincy.


Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #55335: pacific: Issue removing subvolume with retained snapshots - Possible quincy regression?ResolvedKotresh Hiremath RavishankarActions
Copied to CephFS - Backport #55336: quincy: Issue removing subvolume with retained snapshots - Possible quincy regression?ResolvedKotresh Hiremath RavishankarActions
Actions #1

Updated by Venky Shankar about 2 years ago

  • Status changed from New to Triaged
  • Assignee set to Kotresh Hiremath Ravishankar
  • Target version set to v18.0.0
  • Backport set to quincy, pacific
Actions #2

Updated by Kotresh Hiremath Ravishankar about 2 years ago

Hi John,

Now I expect that last "subvolume rm" to pass as the retained snapshot has been deleted on the previous step.
However, on quincy, it fails with the error "clone in-progress -- please cancel the clone and retry"
But no clones are in progress as far as I can tell, as this procedure doesn't involve cloning at all.

The subvolume deletion with retain-snapshots should not hit this code path. The subvolume deletion happens when
the last retained snapshot is removed. An explicit `subvolume rm` is not required.

...
ceph fs subvolume create ...
ceph fs subvolume info ... # ^^^ verify that subvolume supports retention

ceph fs subvolume snapshot create ...
ceph fs subvolume rm ... # ^^ this is expected to fail. and does!
ceph fs subvolume rm --retain_snapshots ... # ^
^ succeeds (subvol goes to "snapshot-retained" state)
ceph fs subvolume snapshot rm ...

This step should automatically remove the subvolume if this is the last snapshot retained.

ceph fs subvolume rm ...

Could you please confirm that didn't happen in your tests ?

Actions #3

Updated by Kotresh Hiremath Ravishankar about 2 years ago

Design/Code Behavior:
---------------------

Looked into the code further. It's designed in a such way that we should never be deleting the subvolume (with retained snapshots).
The last snapshot deletion would take care of the subvolume (with retained snapshots) and it does. It might take some time to clean-up the subvolume (asynchronous deletion)
after the last snapshot deletion if the subvolume size is huge and subvolume removal with '--retain-snapshots' and last snapshot is deleted in
quick succession. I think the reason to choose this design is that we have already deleted the subvolume (with option --retain-snapshots)
and should not be deleting the same subvolume again.

Let's come to the issue being seen:
-----------------------------------

The error seen "EAGAIN: clone in-progress -- please cancel the clone and retry" is possible in the follow case and this can be ignored.

If the subvolume size is large and contains a snapshot, and following operations are done successively
1. Delete the subvolume with retainsnapshots option
2. Delete the snapshot
3. Delete the subvolume again ---- NOTE THIS IS NOT REQUIRED IN FIRST PLACE

Difference of behavior between releases:
----------------------------------------

As you rightly pointed, with the introduction of the function 'safe_to_remove_subvolume_clone', behavior has changed. Earlier, the deletion at
step 3 above is a noop and returned success to user. With this function in place, EAGAIN is returned instead of noop. Well, this will be fixed to
match the earlier behavior, but you don't need the deletion at step 3 above.

Technical details:
------------------
The subvolume trash directory is maintained within subvolume when it's deleted with '--retain-snapshots' option. When last snapshot is deleted and trash directory is not empty, it is left to purge job to delete the subvolume and it does after the trash is empty. If the trash directory is empty at the time last snapshot deletion, snapshot deletion code takes care of subvolume deletion.

The step 3 above is chosen to be noop because of the above behavior.

Actions #4

Updated by John Mulligan about 2 years ago

Great, thank you for the assistance.
I've removed the unneeded subvolume delete action from the test and it is now passing on quincy locally.
I've submitted the changes to run on our CI which will also check older ceph versions. I assume it will work, but if not I'll let you know.

Actions #5

Updated by Kotresh Hiremath Ravishankar about 2 years ago

  • Status changed from Triaged to Fix Under Review
  • Pull request ID set to 45683
Actions #6

Updated by Venky Shankar about 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Backport Bot about 2 years ago

  • Copied to Backport #55335: pacific: Issue removing subvolume with retained snapshots - Possible quincy regression? added
Actions #8

Updated by Backport Bot about 2 years ago

  • Copied to Backport #55336: quincy: Issue removing subvolume with retained snapshots - Possible quincy regression? added
Actions #9

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #10

Updated by Konstantin Shalygin over 1 year ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (backport_processed)
Actions

Also available in: Atom PDF