Ceph &raquo; rbd

Target version:

% Done:

Source:

Q/A

Tags:

backport_processed

Backport:

pacific,quincy,reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

54251

Crash signature (v1):

Crash signature (v2):

Description

Ran the integration test in https://github.com/ceph/ceph/pull/53535 that repeatedly blocklists the rbd_support module's RADOS client approximately every 10 seconds after the module recovers from previous blocklisting at http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/ . Observed 2 job failures,
- http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401648/
- http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401660/
where the rbd_support module didn't recover from blocklisting due to the following issue. The module's MirrorSnapshotScheduleHandler got stuck trying to wait for its run thread to terminate in its shutdown() method.
Excerpt from the mgr log at /a/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401648/remote/smithi099/log/ceph-mgr.x.log.gz in teuthology.

2023-09-23T07:18:32.518+0000 7fe0ea9fe640  0 [rbd_support ERROR root] TrashPurgeScheduleHandler: client blocklisted
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/rbd_support/trash_purge_schedule.py", line 46, in run
    refresh_delay = self.refresh_pools()
  File "/usr/share/ceph/mgr/rbd_support/trash_purge_schedule.py", line 95, in refresh_pools
    self.load_schedules()
  File "/usr/share/ceph/mgr/rbd_support/trash_purge_schedule.py", line 85, in load_schedules
    self.schedules.load()
  File "/usr/share/ceph/mgr/rbd_support/schedule.py", line 419, in load
    self.load_from_pool(ioctx, namespace_validator,
  File "/usr/share/ceph/mgr/rbd_support/schedule.py", line 442, in load_from_pool
    ioctx.operate_read_op(read_op, self.handler.SCHEDULE_OID)
  File "rados.pyx", line 3723, in rados.Ioctx.operate_read_op
rados.ConnectionShutdown: [errno 108] RADOS connection was shutdown (Failed to operate read op for oid rbd_trash_purge_schedule)
2023-09-23T07:18:32.518+0000 7fe0efa08640  0 [rbd_support INFO root] recovering from blocklisting
2023-09-23T07:18:32.518+0000 7fe0efa08640  0 [rbd_support INFO root] MirrorSnapshotScheduleHandler: shutting down
2023-09-23T07:18:32.522+0000 7fe0efa08640  0 [rbd_support DEBUG root] MirrorSnapshotScheduleHandler: joining thread

After this I don't see any logs from MirrorSnapshotScheduleHandler and TrashPurgeScheduleHandler. I only see ticks from PerfHandler and TaskHandler.

Related issues 5 (0 open — 5 closed)

Related to rbd - Bug #56724: [rbd_support] recover from RADOS instance blocklisting

Resolved

Related to rbd - Bug #62891: [test][rbd] test recovery of rbd_support module from repeated blocklisting of its client

Resolved

Copied to rbd - Backport #63382: pacific: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread

Resolved

Copied to rbd - Backport #63383: quincy: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread

Resolved

Copied to rbd - Backport #63384: reef: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread

Resolved

Updated by Ramana Raja 7 months ago

Assignee set to Ramana Raja

Actions

Updated by Ramana Raja 7 months ago

Related to Bug #56724: [rbd_support] recover from RADOS instance blocklisting added

Actions

Updated by Ramana Raja 7 months ago

Related to Bug #62891: [test][rbd] test recovery of rbd_support module from repeated blocklisting of its client added

Actions

Updated by Ramana Raja 6 months ago

Added debug log messages to uncover the hang in MirrorSnapshotScheduleHandler's run thread, https://github.com/ceph/ceph-ci/commits/ajarr-wip-extra-debug-62994-oct-19

Observed 14 job failures out of 20 (http://pulpito.front.sepia.ceph.com/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/)
when running https://github.com/ceph/ceph/pull/53535 to test recovery of rbd_support module on repeated client blocklisting

- It appears that MirrorSnapshotScheduleHandler's run thread fails to acquire CreateSnapshotRequests's lock() and hangs in CreateSnapshotRequest.get_ioctx().
Following is excerpt from the ceph-mgr log at /a/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433375/remote/smithi119/log/ceph-mgr.x.log.gz

2023-10-21T04:23:00.108+0000 7ffac51c9640  0 [rbd_support DEBUG root] CreateSnapshotRequests.add: 2//10e2ca42ae37
2023-10-21T04:23:00.108+0000 7ffab21a3640 20 librbd::AsioEngine: 0x5625e0f1ea90 ~AsioEngine:
2023-10-21T04:23:00.108+0000 7ffab21a3640 20 librbd::asio::ContextWQ: 0x5625df911360 ~ContextWQ:
2023-10-21T04:23:00.108+0000 7ffab21a3640 20 librbd::asio::ContextWQ: 0x5625df911360 drain:
2023-10-21T04:23:00.108+0000 7ffac51c9640 20 mgr get_config  key: mgr/rbd_support/x/max_concurrent_snap_create
2023-10-21T04:23:00.108+0000 7ffac51c9640 20 mgr get_config  key: mgr/rbd_support/max_concurrent_snap_create
2023-10-21T04:23:00.108+0000 7ffac51c9640 10 mgr get_typed_config  [x/]max_concurrent_snap_create not found
2023-10-21T04:23:00.108+0000 7ffac51c9640  0 [rbd_support DEBUG root] CreateSnapshotRequests.add: 2//10e2ca42ae37 acquired lock
2023-10-21T04:23:00.108+0000 7ffac51c9640  0 [rbd_support DEBUG root] CreateSnapshotRequests.add: 2//10e2ca42ae37 released lock
2023-10-21T04:23:00.108+0000 7ffac51c9640  0 [rbd_support DEBUG root] CreateSnapshotRequests.open_image: 2//10e2ca42ae37
2023-10-21T04:23:00.108+0000 7ffac51c9640  0 [rbd_support DEBUG root] entering CreateSnapshotRequests.get_ioctx: 2//10e2ca42ae37
2023-10-21T04:23:00.108+0000 7ffab21a3640 20 librbd: C_OpenComplete::finish: r=-108
2023-10-21T04:23:00.108+0000 7ffab21a3640 20 librbd: C_AioCompletion::finish: r=-108
2023-10-21T04:23:00.108+0000 7ffab21a3640  0 [rbd_support DEBUG root] CreateSnapshotRequests.handle_open_image 2//10d9603318c4: r=-108
2023-10-21T04:23:00.108+0000 7ffab21a3640  0 [rbd_support ERROR root] error when opening 2//10d9603318c4: -108
2023-10-21T04:23:00.108+0000 7ffab21a3640  0 [rbd_support DEBUG root] CreateSnapshotRequests.finish: 2//10d9603318c4
2023-10-21T04:23:00.108+0000 7ffab21a3640  0 [rbd_support DEBUG root] entering CreateSnapshotRequests.put_ioctx: 2//10d9603318c4

MirrorSnapshotSchedulerHandler's run thread, 7ffac51c9640 enters get_ioctx() and doesn't proceed to acquire the lock(). No more log messages observed from 7ffac51c9640. At about the same time, handle_open_image() callback thread 7ffab21a3640 logs the error and also fails to acquire lock in put_ioctx().

Source: /a/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433375/remote/smithi119/log/ceph-mgr.x.log.gz
http://pulpito.front.sepia.ceph.com/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433375/

- Sometimes, the MirrorSnapshotScheduleHandler's run thread doesn't proceed after trying to open_ioctx() in CreateSnapshotRequests.get_ioctx(). Following is the excerpt from the ceph-mgr.log at /a/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433370/remote/smithi146/log/ceph-mgr.x.log.gz

2023-10-21T04:14:00.018+0000 7f3f0812b640  0 [rbd_support DEBUG root] CreateSnapshotRequests.add: 2//10979b31b5a7 acquired lock
2023-10-21T04:14:00.018+0000 7f3f0812b640  0 [rbd_support DEBUG root] CreateSnapshotRequests.add: 2//10979b31b5a7 released lock
2023-10-21T04:14:00.018+0000 7f3f0812b640  0 [rbd_support DEBUG root] CreateSnapshotRequests.open_image: 2//10979b31b5a7
2023-10-21T04:14:00.018+0000 7f3f0812b640  0 [rbd_support DEBUG root] entering CreateSnapshotRequests.get_ioctx: 2//10979b31b5a7
2023-10-21T04:14:00.018+0000 7f3f0812b640  0 [rbd_support DEBUG root] CreateSnapshotRequests.get_ioctx acquired lock: 2//10979b31b5a7
2023-10-21T04:14:00.018+0000 7f3f0812b640  0 [rbd_support DEBUG root] CreateSnapshotRequests.get_ioctx opening ioctx: 2//10979b31b5a7
2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: create_ioctx calling pool_get_name()
2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: pool_get_name calling wait_for_osdmap()
2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: wait_for_osdmap entering
2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: wait_for_osdmap exiting without needing map
2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: pool_get_name calling objecter->with_osdmap()
2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: pool_get_name exiting with r=0

MirrorSnapshotSchedulerHandler's run thread, 7f3f0812b640 tries to rados.open_ioctx() and doesn't proceed further in CreateSnapshotRequests.get_ioctx(). No more log messages are observed from 7f3f0812b640. The above seems to point to an issue in librados/objecter code.

Source: /a/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433370/remote/smithi146/log/ceph-mgr.x.log.gz
http://pulpito.front.sepia.ceph.com/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433370/

Relevant code snippets from pybind/mgr/rbd_support/mirror_snapshot_schedule.py,

class CreateSnapshotRequests:
    ...

    def open_image(self, image_spec: ImageSpec) -> None:
        pool_id, namespace, image_id = image_spec

        self.log.debug("CreateSnapshotRequests.open_image: {}/{}/{}".format(
            pool_id, namespace, image_id))

        try:
            ioctx = self.get_ioctx(image_spec)

            def cb(comp: rados.Completion, image: rbd.Image) -> None:
                self.handle_open_image(image_spec, comp, image)

            self.log.debug("calling aio_open_image CreateSnapshotRequests.open_image: {}/{}/{}".format(
                pool_id, namespace, image_id))
            rbd.RBD().aio_open_image(cb, ioctx, image_id=image_id)
            self.log.debug("exiting CreateSnapshotRequests.open_image: {}/{}/{}".format(
                pool_id, namespace, image_id))
        except Exception as e:
            self.log.error(
                "exception when opening {}/{}/{}: {}".format(
                    pool_id, namespace, image_id, e))
            self.finish(image_spec)

    ...

    def get_ioctx(self, image_spec: ImageSpec) -> rados.Ioctx:
        pool_id, namespace, image_id = image_spec
        nspec = (pool_id, namespace)
        self.log.debug("entering CreateSnapshotRequests.get_ioctx: {}/{}/{}".format(
            pool_id, namespace, image_id))

        with self.lock:
            self.log.debug("CreateSnapshotRequests.get_ioctx acquired lock: {}/{}/{}".format(
                pool_id, namespace, image_id))
            ioctx, images = self.ioctxs.get(nspec, (None, None))
            if not ioctx:
                self.log.debug("CreateSnapshotRequests.get_ioctx opening ioctx: {}/{}/{}".format(
                    pool_id, namespace, image_id))
                ioctx = self.rados.open_ioctx2(int(pool_id))
                self.log.debug("CreateSnapshotRequests.get_ioctx setting namespace: {}/{}/{}".format(
                    pool_id, namespace, image_id))
                ioctx.set_namespace(namespace)
                images = set()
                self.ioctxs[nspec] = (ioctx, images)
            assert images is not None
            images.add(image_spec)
        self.log.debug("exiting CreateSnapshotRequests.get_ioctx: {}/{}/{} and released lock".format(
            pool_id, namespace, image_id))

        return ioctx

Actions

Updated by Ilya Dryomov 6 months ago

Status changed from New to In Progress
Backport set to pacific,quincy,reef

Actions

Updated by Ramana Raja 6 months ago

Examined the MirrorSnapshotScheduleHandler's run thread using gdb (with python extensions `sudo dnf debuginfo-install python3`) in vstart environment:

$ gdb attach <ceph-mgr-pid>
...
(gdb) thread apply all py-bt
# identified that thread 91 was the MirrorSnapshotScheduleHandler's run thread from the python backtraces
(gdb) thread 91
[Switching to thread 91 (Thread 0x7f50b8100640 (LWP 166305))]
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5619cd688380) at futex-internal.c:57
57          return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, expected,
(gdb) py-list
  47        def __del__(self) -> None:
  48            self.log.debug("{} CreateSnapshotRequests.wait_for_pending: __del__".format(repr(self)))
  49            self.wait_for_pending()
  50
  51        def wait_for_pending(self) -> None:
 >52            with self.lock:
  53                self.log.debug("CreateSnapshotRequests.wait_for_pending: acquired lock")
  54                while self.pending:
  55                    self.log.debug(
  56                        "CreateSnapshotRequests.wait_for_pending: " 
  57                        "{} images".format(len(self.pending)))
(gdb) py-bt
Traceback (most recent call first):
  <built-in method __enter__ of _thread.lock object at remote 0x7f50d21c0d00>
  File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 52, in wait_for_pending
    with self.lock:
  File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 49, in __del__
    self.wait_for_pending()
  Garbage-collecting
  File "/usr/lib64/python3.10/logging/__init__.py", line 819, in filter
    for f in self.filters:
  File "/usr/lib64/python3.10/logging/__init__.py", line 964, in handle
    rv = self.filter(record)
  File "/usr/lib64/python3.10/logging/__init__.py", line 1696, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.10/logging/__init__.py", line 1634, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.10/logging/__init__.py", line 1624, in _log
    self.handle(record)
  File "/usr/lib64/python3.10/logging/__init__.py", line 1465, in debug
    self._log(DEBUG, msg, args, **kwargs)
  File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 328, in get_ioctx
    self.log.debug("{} CreateSnapshotRequests.get_ioctx acquired lock: {}/{}/{}".format(
  File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 98, in open_image
    ioctx = self.get_ioctx(image_spec)
  File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 87, in add
    self.open_image(image_spec)
  File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 407, in run
    self.create_snapshot_requests.add(pool_id, namespace, image_id)
  File "/usr/lib64/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

Observed a recursive deadlock in the MirrorSnapshotSchedulerHandler's run thread. Ilya identified the root cause of the deadlock. The thread acquired a CreateSnapshotRequests's class variable lock (shared across all CreateSnapshotRequests instances) in CreateSnapshotRequests.get_ioctx(), and then attempted to garbage collect another CreateSnapshotRequest instance. When garbage collecting, CreateSnapshotRequests.__del__() was called, which in turn called CreateSnapshotRequests.wait_for_pending(). Here the thread expectedly failed to acquire the CreateSnapshotRequests's class variable lock that the thread already acquired when it called the other CreateSnapshotRequests's get_ioctx() earlier.

2023-10-26T17:54:00.002-0400 7f50b8100640  0 [rbd_support DEBUG root] CreateSnapshotRequests.open_image: 3//15032b4e30aa
...
2023-10-26T17:54:00.002-0400 7f50b8100640  0 [rbd_support DEBUG root] <rbd_support.mirror_snapshot_schedule.CreateSnapshotRequests object at 0x7f50b89f0d30> entering CreateSnapshotRequests.get_ioctx: 3//15032b4e30aa
...
2023-10-26T17:54:00.003-0400 7f50b8100640  0 [rbd_support DEBUG root] <rbd_support.mirror_snapshot_schedule.CreateSnapshotRequests object at 0x7f50b89f3370> CreateSnapshotRequests.wait_for_pending: __del__
...
2023-10-26T17:54:00.006-0400 7f50a68dd640  0 [rbd_support DEBUG root] CreateSnapshotRequests.handle_open_image 3//14fdd5b36d44: r=-108
2023-10-26T17:54:00.006-0400 7f50a68dd640  0 [rbd_support ERROR root] error when opening 3//14fdd5b36d44: -108
2023-10-26T17:54:00.006-0400 7f50a68dd640  0 [rbd_support DEBUG root] CreateSnapshotRequests.finish: 3//14fdd5b36d44
2023-10-26T17:54:00.006-0400 7f50a68dd640  0 [rbd_support DEBUG root] <rbd_support.mirror_snapshot_schedule.CreateSnapshotRequests object at 0x7f50b89f0d30> entering CreateSnapshotRequests.put_ioctx: 3//14fdd5b36d44

The above ceph-mgr log excerpt shows that in the MirrorSnapshotScheduleHandler's run thread, the get_ioctx() and the subsequent del() belong to different CreateSnapshotRequests instances. Please note that extra logging was added in mirror_snapshot_schedule.py to generate these logs.

diff --git a/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py b/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py
index 13242388282..f7567a03be3 100644
--- a/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py
+++ b/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py
@@ -45,6 +45,7 @@ class CreateSnapshotRequests:
         self.ioctxs: Dict[Tuple[str, str], Tuple[rados.Ioctx, Set[ImageSpec]]] = {}

     def __del__(self) -> None:
+        self.log.debug("{} CreateSnapshotRequests.wait_for_pending: __del__".format(repr(self)))
         self.wait_for_pending()

     def wait_for_pending(self) -> None:
@@ -320,44 +321,44 @@ class CreateSnapshotRequests:
     def get_ioctx(self, image_spec: ImageSpec) -> rados.Ioctx:
         pool_id, namespace, image_id = image_spec
         nspec = (pool_id, namespace)
-        self.log.debug("entering CreateSnapshotRequests.get_ioctx: {}/{}/{}".format(
-            pool_id, namespace, image_id))
+        self.log.debug("{} entering CreateSnapshotRequests.get_ioctx: {}/{}/{}".format(
+            repr(self), pool_id, namespace, image_id))

Actions

Updated by Ramana Raja 6 months ago

Converted the CreateSnapshotRequests's lock from a class variable to an instance variable, and tested the change through teuthology. All 20 jobs of rbd_support module recovery on blocklisting integration test passed.
https://github.com/ceph/ceph-ci/commits/wip-ajarr-fix-62994-oct-26-2
https://pulpito.ceph.com/rraja-2023-10-27_00:00:53-rbd:cli-wip-ajarr-fix-62994-oct-26-2-distro-default-smithi/

Also tried another approach where I got rid of CreateSnapshotRequests's del() method, and tested the change. All 20 jobs of rbd_support module recovery on blocklisting integration test passed again.
https://github.com/ceph/ceph-ci/commits/wip-ajarr-fix-62994-oct-25
https://pulpito.ceph.com/rraja-2023-10-26_03:38:28-rbd:cli-wip-ajarr-fix-62994-oct-25-distro-default-smithi/

The two types of hangs noted in https://tracker.ceph.com/issues/62994#note-4 were not observed with the above fixes. Also, the cause of two hangs can be explained by the recursive deadlock occurring in MirrorSnapshotSchedulerHandler's run thread. See https://tracker.ceph.com/issues/62994#note-6 .

Actions

Updated by Ramana Raja 6 months ago

Status changed from In Progress to Fix Under Review
Pull request ID set to 54251

Actions

Updated by Ilya Dryomov 6 months ago

Status changed from Fix Under Review to Pending Backport

Actions

#10

Updated by Backport Bot 6 months ago

Copied to Backport #63382: pacific: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread added

Actions

#11

Updated by Backport Bot 6 months ago

Copied to Backport #63383: quincy: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread added

Actions

#12

Updated by Backport Bot 6 months ago

Copied to Backport #63384: reef: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread added

Actions

#13

Updated by Backport Bot 6 months ago

Tags set to backport_processed

Actions