Project

General

Profile

Actions

Feature #63544

open

mgr/volumes: bulk delete canceled clones

Added by Venky Shankar 6 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
Tags:
Backport:
reef,quincy
Reviewed:
Affected Versions:
Component(FS):
mgr/volumes
Labels (FS):
task(medium)
Pull request ID:

Description

Creating this feature tracker to discuss the effort involved and if it would really help in cleaning out canceled clones faster. Right now the user has to invoked the cli once for each canceled clone and takes a considerable amount of time (2h for ~4k clones from what's being reported), Maybe cleaning this up in one-shot (and even cancelling all pending clones in one-shot) could speed things up.


Files

ceph-clone-cleanup.sh (11.9 KB) ceph-clone-cleanup.sh Raimund Sacherer, 11/20/2023 03:16 PM

Related issues 2 (2 open0 closed)

Related to CephFS - Feature #61904: pybind/mgr/volumes: add more introspection for clonesFix Under ReviewRishabh Dave

Actions
Related to CephFS - Feature #61905: pybind/mgr/volumes: add more introspection for recursive unlink threadsIn ProgressRishabh Dave

Actions
Actions #1

Updated by Kotresh Hiremath Ravishankar 6 months ago

The time consumption is part of the fetching the list of cancelled clones and issuing subvolume rm ? If that's the case, we can provide an api and do it internally. I am not sure how much time it would save though.

Actions #2

Updated by Venky Shankar 6 months ago

Kotresh Hiremath Ravishankar wrote:

The time consumption is part of the fetching the list of cancelled clones and issuing subvolume rm ?

Just for subvolume rm on ~4080 subvolumes - that's ~1.7s per subvolume removal.

If that's the case, we can provide an api and do it internally. I am not sure how much time it would save though.

That could possibly be driven by a separate thread. We'll have to check with csi folks w.r.t. using that and also know about the completion status.

Actions #3

Updated by Kotresh Hiremath Ravishankar 6 months ago

I further looked into the snippet of the script used. It is as below.

For cancellation:
--------------------
sudo ceph fs subvolume ls  --vol_name cephfs --group_name csi  2>&1
sudo ceph fs clone status --clone_name csi-vol-7b664dd3-4cf4-11ee-94d9-0a580a8e0626 --vol_name cephfs --group_name csi  > /tmp/tmp.GlhVs6lsfh 2>&1
sudo ceph fs clone status --clone_name csi-vol-fafa7c7f-5474-4d85-b777-8a7bac8eb908 --vol_name cephfs --group_name csi  > /tmp/tmp.GlhVs6lsfh 2>&1
sudo ceph fs clone cancel --clone_name csi-vol-fafa7c7f-5474-4d85-b777-8a7bac8eb908 --vol_name cephfs --group_name csi  > /tmp/tmp.GlhVs6lsfh 2>&1
sudo ceph fs clone status --clone_name csi-vol-3897be6d-eaf0-4220-83a1-c36b39a085b2 --vol_name cephfs --group_name csi  > /tmp/tmp.GlhVs6lsfh 2>&1
sudo ceph fs clone cancel --clone_name csi-vol-3897be6d-eaf0-4220-83a1-c36b39a085b2 --vol_name cephfs --group_name csi  > /tmp/tmp.GlhVs6lsfh 2>&1
sudo ceph fs clone status --clone_name csi-vol-ff75b068-4ce3-11ee-9c69-0a580a8e080a --vol_name cephfs --group_name csi  > /tmp/tmp.GlhVs6lsfh 2>&1
sudo ceph fs clone status --clone_name csi-vol-90fc0660-479c-4106-9301-5c3de635fcc0 --vol_name cephfs --group_name csi  > /tmp/tmp.GlhVs6lsfh 2>&1
sudo ceph fs clone cancel --clone_name csi-vol-90fc0660-479c-4106-9301-5c3de635fcc0 --vol_name cephfs --group_name csi  > /tmp/tmp.GlhVs6lsfh 2>&1
sudo ceph fs clone status --clone_name csi-vol-c6c56d38-69bd-4905-8e73-6b992cced02d --vol_name cephfs --group_name csi  > /tmp/tmp.GlhVs6lsfh 2>&1
...
...

For clone/subvolume deletion:
-----------------------------
sudo ceph fs subvolume ls  --vol_name cephfs --group_name csi  2>&1
sudo ceph fs clone status --clone_name csi-vol-7b664dd3-4cf4-11ee-94d9-0a580a8e0626 --vol_name cephfs --group_name csi  > /tmp/tmp.HjuB6hDYKO 2>&1
sudo ceph fs subvolume rm --sub_name csi-vol-7b664dd3-4cf4-11ee-94d9-0a580a8e0626 --vol_name cephfs --group_name csi --force > /tmp/tmp.HjuB6hDYKO 2>&1
sudo ceph fs clone status --clone_name csi-vol-fafa7c7f-5474-4d85-b777-8a7bac8eb908 --vol_name cephfs --group_name csi  > /tmp/tmp.HjuB6hDYKO 2>&1
sudo ceph fs subvolume rm --sub_name csi-vol-fafa7c7f-5474-4d85-b777-8a7bac8eb908 --vol_name cephfs --group_name csi --force > /tmp/tmp.HjuB6hDYKO 2>&1
sudo ceph fs clone status --clone_name csi-vol-3897be6d-eaf0-4220-83a1-c36b39a085b2 --vol_name cephfs --group_name csi  > /tmp/tmp.HjuB6hDYKO 2>&1
sudo ceph fs subvolume rm --sub_name csi-vol-3897be6d-eaf0-4220-83a1-c36b39a085b2 --vol_name cephfs --group_name csi --force > /tmp/tmp.HjuB6hDYKO 2>&1
sudo ceph fs clone status --clone_name csi-vol-ff75b068-4ce3-11ee-9c69-0a580a8e080a --vol_name cephfs --group_name csi  > /tmp/tmp.HjuB6hDYKO 2>&1
sudo ceph fs subvolume rm --sub_name csi-vol-ff75b068-4ce3-11ee-9c69-0a580a8e080a --vol_name cephfs --group_name csi --force > /tmp/tmp.HjuB6hDYKO 2>&1
sudo ceph fs clone status --clone_name csi-vol-90fc0660-479c-4106-9301-5c3de635fcc0 --vol_name cephfs --group_name csi  > /tmp/tmp.HjuB6hDYKO 2>&1
sudo ceph fs subvolume rm --sub_name csi-vol-90fc0660-479c-4106-9301-5c3de635fcc0 --vol_name cephfs --group_name csi --force > /tmp/tmp.HjuB6hDYKO 2>&1
sudo ceph fs clone status --clone_name csi-vol-c6c56d38-69bd-4905-8e73-6b992cced02d --vol_name cephfs --group_name csi  > /tmp/tmp.HjuB6hDYKO 2>&1
...
...

So one 'clone status' and 'subvolume rm' command for each deletion of cancelled clone. So, having a single api could be helpful potentially saving network time/lock time? when thousands of cancelled clones needs to be deleted. I think this is worth exploring. But we maintain the list of pending clones but not the cancelled clones. So might need to add support for that in order to achieve this.

Also, more time is consumed (around 4-5 hrs) for cancelling 4k pending clones. Not sure, we should also look into this. But this is tricky as we don't know the list of the clones that needs to be cancelled among all pending clones via one api.

Thanks,
Kotresh H R

Actions #4

Updated by Raimund Sacherer 6 months ago

Hi,

I created a script to do this as I had to do it now on two CU clusters. I think one of the issues is running ceph binary (which is python) so many times. I was thinking about re-writing my shell script into a python script, this could shave off time if I can do everything from within python.

I also am very cautious and check everything to make sure to do the right thing, the script also does create a log with actions taken and writes out all the commands executed, just in case.

The first pass, running when we actually had > 4080 pending clones and deleting the pending clones, took 4-5 hours. The second pass, only to delete the canceled clones, was faster (aprox 2 hours). I could have run canceling clones and removing the canceling clones in the same pass, but I was cautious and did want to do it in two passes.

I'll attach the script. It has multiple functionalities which I see usefull in collecting data to see if there are pending clones, which snapshot has them, list the pending clones, and of course, run cleanup operations.

Actions #5

Updated by Raimund Sacherer 6 months ago

Ah, I forgot, the most important part for me to write this script was to give the CU a progress counter so we all actually know where we are and how long it still will going to take, without this, we are blind and I have CU cancel those long standing operations because it seems to not move further.

Actions #6

Updated by Venky Shankar 6 months ago

Kotresh Hiremath Ravishankar wrote:

I further looked into the snippet of the script used. It is as below.

[...]

So one 'clone status' and 'subvolume rm' command for each deletion of cancelled clone. So, having a single api could be helpful potentially saving network time/lock time? when thousands of cancelled clones needs to be deleted. I think this is worth exploring. But we maintain the list of pending clones but not the cancelled clones. So might need to add support for that in order to achieve this.

Right now canceling a clone removes the index pointer. One option could be to leave the index pointer as it is and teach the cloner thread to skip canceled clones. Furthermore, a separate thread would sweep the indexes and clean out canceled clones. Or, have a separate tracking directory where the canceled indexes are moved. Doing this in bulk would save lot of context switch calls.

Also, more time is consumed (around 4-5 hrs) for cancelling 4k pending clones. Not sure, we should also look into this. But this is tricky as we don't know the list of the clones that needs to be cancelled among all pending clones via one api.

Yeh. We can either support cancelling all pending clones (checkpoint the last pending clone and initiate cancel till that clone - to avoid endlessly scanning if more clones are sheduled) or a single clone (which is what's supported right now).

Actions #7

Updated by Venky Shankar 6 months ago

Raimund Sacherer wrote:

Ah, I forgot, the most important part for me to write this script was to give the CU a progress counter so we all actually know where we are and how long it still will going to take, without this, we are blind and I have CU cancel those long standing operations because it seems to not move further.

A bunch of the async jobs in volumes plugin is getting hooked withe a progress counters that nicely shows up in `ceph status`.

Actions #8

Updated by Venky Shankar 6 months ago

  • Related to Feature #61904: pybind/mgr/volumes: add more introspection for clones added
Actions #9

Updated by Venky Shankar 6 months ago

  • Related to Feature #61905: pybind/mgr/volumes: add more introspection for recursive unlink threads added
Actions #10

Updated by Venky Shankar 6 months ago

  • Category set to Performance/Resource Usage
  • Assignee set to Neeraj Pratap Singh
  • Target version set to v19.0.0
  • Backport set to reef,quincy
  • Labels (FS) task(medium) added

I guess the general feedback for this enhancement seems to be positive in terms of performance improvement and resource utilization.

Neeraj, lets start talking to CSI folks and put forward this proposal to seek an ack in terms of changes from their side.

Actions #11

Updated by Raimund Sacherer 5 months ago

Thanks for looking into this!

Actions

Also available in: Atom PDF