Project

General

Profile

Actions

Bug #58090

open

Non-existent pending clone shows up in snapshot info

Added by Sebastian Hasler over 1 year ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Category:
fsck/damage handling
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific,quincy
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph version: v17.2.5

My CephFS somehow got in a state where a snapshot has a pending clone, but the pending clone doesn't exist. (This is problematic, because the pending clone prevents me from being able to delete the snapshot.)

$ ceph fs subvolume --group_name=csi snapshot info ssd-fs csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e
{
    "created_at": "2021-11-27 19:54:16.134448",
    "data_pool": "ssd-fs-data0",
    "has_pending_clones": "yes",
    "pending_clones": [
        {
            "name": "csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec",
            "target_group": "csi" 
        }
    ]
}

$ ceph fs clone --group_name=csi status ssd-fs csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec
Error ENOENT: subvolume 'csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec' does not exist

I think the CephFS got in this state when the clone failed due to insufficient disk space. This was already some time ago with an older version of Ceph. It might or might not have been fixed in the meantime.

The point of this ticket is that CephFS should be able to recover from this state, but currently that seems to not be the case.

To try to recover from this state, I had the idea to re-create the clone with that exact name and then cancel it.

$ ceph fs subvolume --group_name=csi snapshot clone ssd-fs csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec --target_group_name=csi

$ ceph fs clone --group_name=csi status ssd-fs csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec
{
  "status": {
    "state": "in-progress",
    "source": {
      "volume": "ssd-fs",
      "subvolume": "csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e",
      "snapshot": "csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e",
      "group": "csi" 
    }
  }
}

$ ceph fs subvolume --group_name=csi snapshot info ssd-fs csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e
{
    "created_at": "2021-11-27 19:54:16.134448",
    "data_pool": "ssd-fs-data0",
    "has_pending_clones": "yes",
    "pending_clones": [
        {
            "name": "csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec",
            "target_group": "csi" 
        },
        {
            "name": "csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec",
            "target_group": "csi" 
        }
    ]
}

$ ceph fs clone --group_name=csi cancel ssd-fs csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec

$ ceph fs clone --group_name=csi status ssd-fs csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec
{
  "status": {
    "state": "canceled",
    "source": {
      "volume": "ssd-fs",
      "subvolume": "csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e",
      "snapshot": "csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e",
      "group": "csi" 
    },
    "failure": {
      "errno": "4",
      "error_msg": "user interrupted clone operation" 
    }
  }
}

$ ceph fs subvolume --group_name=csi snapshot info ssd-fs csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e
{
    "created_at": "2021-11-27 19:54:16.134448",
    "data_pool": "ssd-fs-data0",
    "has_pending_clones": "yes",
    "pending_clones": [
        {
            "name": "csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec",
            "target_group": "csi" 
        }
    ]
}

However, as you can see, re-creating the clone leads to a duplicate entry in the `pending_clones` list, and cancellation of the clone just removes one of those two entries. So there's still the pending clone which I don't get rid of, so I cannot delete the snapshot.

Actions

Also available in: Atom PDF