Project

General

Profile

Bug #55217

pybind/mgr/volumes: Clone operation hangs

Added by Kotresh Hiremath Ravishankar 8 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

100%

Source:
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
task(medium)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The clone operation hangs while testing clone failure PR locally (teuthology via vstart_runner).
The hang is seen while running two successive tests sequentially. The operations done in the
tests are as follows.

test1
-------------------
1. Created subvolume,
2. Written some I/O on the subvolume
3. Create snapshot of the subvolume
4. Create clone of the snapshot
5. Delete snapshot from back end (don't use subvolume interface) before clone completes (used snapshot_clone_delay configuration to achieve it)
6. Delete clone with force
7. Delete subvolume

test2
------------------
1. Created subvolume,
2. Written some I/O on the subvolume
3. Create snapshot of the subvolume
4. Create clone of the snapshot <---------------THIS OPERATION HANGS -----------------

The 'test1' succeeds but while running 'test2', the clone operation hangs.

Analysis and Root Cause:
------------------------
In step 5 of 'test1', the snapshot is deleted from the back end without using subvolume interface.
Because of this, the clone scheduled in step 4, fails. When the clone is failed, we detach the
snapshot ('detach_snapshot' routine) which removes the tracker from the clone index located at
'/volumes/_index/clone/'. Since the snapshot is removed from the back end, 'detach_snapshot'
fails to remove the index.


 def detach_snapshot(self, snapname, track_id):
        if not snapname.encode('utf-8') in self.list_snapshots():
            raise VolumeException(-errno.ENOENT, "snapshot '{0}' does not exist".format(snapname))  <---- FAILS HERE-----
        try:
            with open_clone_index(self.fs, self.vol_spec) as index:
                index.untrack(track_id)
                self._remove_snap_clone(track_id)
        except (IndexException, MetadataMgrException) as e:
            log.warning("error delining snapshot from clone: {0}".format(e))
            raise VolumeException(-errno.EINVAL, "error delinking snapshot from clone")

So the cloner thread (from async_job.py), goes into infinite loop trying to clone this index. This involves 'self.async_job.lock()'.
Taking this lock:
1. opens the volume without lock and reads the index to get the job
2. registers the above job
Schedules the job outslide the lock

Check below.

   def run(self):
        retries = 0
        thread_id = threading.currentThread()
        assert isinstance(thread_id, JobThread)
        thread_name = thread_id.getName()
        log.debug("thread [{0}] starting".format(thread_name))

        while retries < JobThread.MAX_RETRIES_ON_EXCEPTION:
            vol_job = None
            try:
                # fetch next job to execute
                with self.async_job.lock:
                    while True:
                        if self.should_reconfigure_num_threads():
                            log.info("thread [{0}] terminating due to reconfigure".format(thread_name))
                            self.async_job.threads.remove(self)
                            return
                        vol_job = self.async_job.get_job()
                        if vol_job:
                            break
                        self.async_job.cv.wait()
                    self.async_job.register_async_job(vol_job[0], vol_job[1], thread_id)

                # execute the job (outside lock)
                self.async_job.execute_job(vol_job[0], vol_job[1], should_cancel=lambda: thread_id.should_cancel())
...

The new clone operation involves the same lock to queue the job:

1. The '_prepare_clone_subvolume' calls the 'self.cloner.queue_job' (async_job.py) which
takes the lock and queues the job and notifies the cloner threads

    def queue_job(self, volname):
        """ 
        queue a volume for asynchronous job execution.
        """ 
        log.info("queuing job for volume '{0}'".format(volname))
        with lock_timeout_log(self.lock):   <---------------------------SAME LOCK -------------
            if volname not in self.q:
                self.q.append(volname)
                self.jobs[volname] = []
            self.cv.notifyAll()

DEAD LOCK EXPLAINED
-----------------------

The cloner thread from 'test1' is in infinite loop of 'lock - getjob - register job - unlock - schedule clone - fails'.
When the 'test1' is completed, the 'fs' is destroyed along with the pools. The cloner thread, while getting job with in the lock,
hangs while reading the index in 'get_oldest_clone_entry' routine since the 'fs' is not longer present,

   def get_oldest_clone_entry(self, exclude=[]):
        min_ctime_entry = None
        exclude_tracking_ids = [v[0] for v in exclude]
        log.debug("excluded tracking ids: {0}".format(exclude_tracking_ids))  <----SOMEWHERE AFTER THIS ----------
        for entry in list_one_entry_at_a_time(self.fs, self.path):
            dname = entry.d_name
            dpath = os.path.join(self.path, dname)
            st = self.fs.lstat(dpath)
            if dname not in exclude_tracking_ids and stat.S_ISLNK(st.st_mode):
                if min_ctime_entry is None or st.st_ctime < min_ctime_entry[1].st_ctime:
                    min_ctime_entry = (dname, st)
        if min_ctime_entry:
            try:
                linklen = min_ctime_entry[1].st_size
                sink_path = self.fs.readlink(os.path.join(self.path, min_ctime_entry[0]), CloneIndex.PATH_MAX)
                return (min_ctime_entry[0], sink_path[:linklen])
            except cephfs.Error as e:
                raise IndexException(-e.args[0], e.args[1])
        return None

Now the new clone operation from 'test2' waits for the same lock forever and hangs.

logs.tar.gz - mgr_logs_and_test_volumes_test_ran (163 KB) Kotresh Hiremath Ravishankar, 04/07/2022 10:26 AM


Related issues

Copied to CephFS - Backport #55352: pacific: pybind/mgr/volumes: Clone operation hangs Resolved
Copied to CephFS - Backport #55353: quincy: pybind/mgr/volumes: Clone operation hangs Resolved

History

#1 Updated by Kotresh Hiremath Ravishankar 8 months ago

  • Component(FS) mgr/volumes added
  • Labels (FS) task(medium) added

#2 Updated by Kotresh Hiremath Ravishankar 8 months ago

The tests ran and mgr logs attached. The code was instrumented a bit to find the callers of the locks.

The mgr.x.log contains the logs after the 'test_clone_failure_status_failed' is run. You can verify that the
thread lock is taken and not released after the test is completed. It doesn't contain logs for the hang, as the
second test was not run in this instance.

----------
2022-04-07T12:29:17.014+0530 7f018b12e700 0 [volumes DEBUG mgr_util] locking <unlocked _thread.lock object at 0x7f01931b04e0> with 5 timeout caller: enter <-- run <-- _bootstrap_inner
2022-04-07T12:29:17.014+0530 7f018b12e700 0 [volumes DEBUG mgr_util] locked <locked _thread.lock object at 0x7f01931b04e0> caller: enter <-- run <-- _bootstrap_inner
2022-04-07T12:29:17.014+0530 7f018b12e700 0 [volumes DEBUG volumes.fs.async_job] processing 1 volume entries
2022-04-07T12:29:17.014+0530 7f018b12e700 0 [volumes DEBUG volumes.fs.async_cloner] fetching clone entry for volume 'cephfs'
2022-04-07T12:29:17.014+0530 7f018b12e700 0 [volumes DEBUG mgr_util] [get] volume: (cephfs) connection: ([<mgr_util.CephfsConnectionPool.Connection object at 0x7f018f953f40>, <mgr_util.CephfsConnectionPool.Connection object at 0x7f018f95
3fd0>])
-------------------

This the last instance of the successful lock, after which it's a dead lock.

#3 Updated by Kotresh Hiremath Ravishankar 8 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 45816

#4 Updated by Venky Shankar 8 months ago

  • Category set to Correctness/Safety
  • Assignee set to Kotresh Hiremath Ravishankar
  • Target version set to v18.0.0
  • Backport set to quincy, pacific

#5 Updated by Kotresh Hiremath Ravishankar 7 months ago

  • Status changed from Fix Under Review to Pending Backport

#6 Updated by Backport Bot 7 months ago

  • Copied to Backport #55352: pacific: pybind/mgr/volumes: Clone operation hangs added

#7 Updated by Backport Bot 7 months ago

  • Copied to Backport #55353: quincy: pybind/mgr/volumes: Clone operation hangs added

#8 Updated by Backport Bot 4 months ago

  • Tags set to backport_processed

#9 Updated by Konstantin Shalygin about 2 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (backport_processed)

Also available in: Atom PDF