Bug #62994
closedmgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread
0%
Description
Ran the integration test in https://github.com/ceph/ceph/pull/53535 that repeatedly blocklists the rbd_support module's RADOS client approximately every 10 seconds after the module recovers from previous blocklisting at http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/ . Observed 2 job failures,
- http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401648/
- http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401660/
where the rbd_support module didn't recover from blocklisting due to the following issue. The module's MirrorSnapshotScheduleHandler got stuck trying to wait for its run thread to terminate in its shutdown() method.
Excerpt from the mgr log at /a/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401648/remote/smithi099/log/ceph-mgr.x.log.gz in teuthology.
2023-09-23T07:18:32.518+0000 7fe0ea9fe640 0 [rbd_support ERROR root] TrashPurgeScheduleHandler: client blocklisted Traceback (most recent call last): File "/usr/share/ceph/mgr/rbd_support/trash_purge_schedule.py", line 46, in run refresh_delay = self.refresh_pools() File "/usr/share/ceph/mgr/rbd_support/trash_purge_schedule.py", line 95, in refresh_pools self.load_schedules() File "/usr/share/ceph/mgr/rbd_support/trash_purge_schedule.py", line 85, in load_schedules self.schedules.load() File "/usr/share/ceph/mgr/rbd_support/schedule.py", line 419, in load self.load_from_pool(ioctx, namespace_validator, File "/usr/share/ceph/mgr/rbd_support/schedule.py", line 442, in load_from_pool ioctx.operate_read_op(read_op, self.handler.SCHEDULE_OID) File "rados.pyx", line 3723, in rados.Ioctx.operate_read_op rados.ConnectionShutdown: [errno 108] RADOS connection was shutdown (Failed to operate read op for oid rbd_trash_purge_schedule) 2023-09-23T07:18:32.518+0000 7fe0efa08640 0 [rbd_support INFO root] recovering from blocklisting 2023-09-23T07:18:32.518+0000 7fe0efa08640 0 [rbd_support INFO root] MirrorSnapshotScheduleHandler: shutting down 2023-09-23T07:18:32.522+0000 7fe0efa08640 0 [rbd_support DEBUG root] MirrorSnapshotScheduleHandler: joining thread
After this I don't see any logs from MirrorSnapshotScheduleHandler and TrashPurgeScheduleHandler. I only see ticks from PerfHandler and TaskHandler.
Updated by Ramana Raja 7 months ago
- Related to Bug #56724: [rbd_support] recover from RADOS instance blocklisting added
Updated by Ramana Raja 7 months ago
- Related to Bug #62891: [test][rbd] test recovery of rbd_support module from repeated blocklisting of its client added
Updated by Ramana Raja 6 months ago
Added debug log messages to uncover the hang in MirrorSnapshotScheduleHandler's run thread, https://github.com/ceph/ceph-ci/commits/ajarr-wip-extra-debug-62994-oct-19
Observed 14 job failures out of 20 (http://pulpito.front.sepia.ceph.com/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/)
when running https://github.com/ceph/ceph/pull/53535 to test recovery of rbd_support module on repeated client blocklisting
- It appears that MirrorSnapshotScheduleHandler's run thread fails to acquire CreateSnapshotRequests's lock() and hangs in CreateSnapshotRequest.get_ioctx().
Following is excerpt from the ceph-mgr log at /a/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433375/remote/smithi119/log/ceph-mgr.x.log.gz
2023-10-21T04:23:00.108+0000 7ffac51c9640 0 [rbd_support DEBUG root] CreateSnapshotRequests.add: 2//10e2ca42ae37 2023-10-21T04:23:00.108+0000 7ffab21a3640 20 librbd::AsioEngine: 0x5625e0f1ea90 ~AsioEngine: 2023-10-21T04:23:00.108+0000 7ffab21a3640 20 librbd::asio::ContextWQ: 0x5625df911360 ~ContextWQ: 2023-10-21T04:23:00.108+0000 7ffab21a3640 20 librbd::asio::ContextWQ: 0x5625df911360 drain: 2023-10-21T04:23:00.108+0000 7ffac51c9640 20 mgr get_config key: mgr/rbd_support/x/max_concurrent_snap_create 2023-10-21T04:23:00.108+0000 7ffac51c9640 20 mgr get_config key: mgr/rbd_support/max_concurrent_snap_create 2023-10-21T04:23:00.108+0000 7ffac51c9640 10 mgr get_typed_config [x/]max_concurrent_snap_create not found 2023-10-21T04:23:00.108+0000 7ffac51c9640 0 [rbd_support DEBUG root] CreateSnapshotRequests.add: 2//10e2ca42ae37 acquired lock 2023-10-21T04:23:00.108+0000 7ffac51c9640 0 [rbd_support DEBUG root] CreateSnapshotRequests.add: 2//10e2ca42ae37 released lock 2023-10-21T04:23:00.108+0000 7ffac51c9640 0 [rbd_support DEBUG root] CreateSnapshotRequests.open_image: 2//10e2ca42ae37 2023-10-21T04:23:00.108+0000 7ffac51c9640 0 [rbd_support DEBUG root] entering CreateSnapshotRequests.get_ioctx: 2//10e2ca42ae37 2023-10-21T04:23:00.108+0000 7ffab21a3640 20 librbd: C_OpenComplete::finish: r=-108 2023-10-21T04:23:00.108+0000 7ffab21a3640 20 librbd: C_AioCompletion::finish: r=-108 2023-10-21T04:23:00.108+0000 7ffab21a3640 0 [rbd_support DEBUG root] CreateSnapshotRequests.handle_open_image 2//10d9603318c4: r=-108 2023-10-21T04:23:00.108+0000 7ffab21a3640 0 [rbd_support ERROR root] error when opening 2//10d9603318c4: -108 2023-10-21T04:23:00.108+0000 7ffab21a3640 0 [rbd_support DEBUG root] CreateSnapshotRequests.finish: 2//10d9603318c4 2023-10-21T04:23:00.108+0000 7ffab21a3640 0 [rbd_support DEBUG root] entering CreateSnapshotRequests.put_ioctx: 2//10d9603318c4
MirrorSnapshotSchedulerHandler's run thread, 7ffac51c9640 enters get_ioctx() and doesn't proceed to acquire the lock(). No more log messages observed from 7ffac51c9640. At about the same time, handle_open_image() callback thread 7ffab21a3640 logs the error and also fails to acquire lock in put_ioctx().
Source: /a/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433375/remote/smithi119/log/ceph-mgr.x.log.gz
http://pulpito.front.sepia.ceph.com/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433375/
- Sometimes, the MirrorSnapshotScheduleHandler's run thread doesn't proceed after trying to open_ioctx() in CreateSnapshotRequests.get_ioctx(). Following is the excerpt from the ceph-mgr.log at /a/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433370/remote/smithi146/log/ceph-mgr.x.log.gz
2023-10-21T04:14:00.018+0000 7f3f0812b640 0 [rbd_support DEBUG root] CreateSnapshotRequests.add: 2//10979b31b5a7 acquired lock 2023-10-21T04:14:00.018+0000 7f3f0812b640 0 [rbd_support DEBUG root] CreateSnapshotRequests.add: 2//10979b31b5a7 released lock 2023-10-21T04:14:00.018+0000 7f3f0812b640 0 [rbd_support DEBUG root] CreateSnapshotRequests.open_image: 2//10979b31b5a7 2023-10-21T04:14:00.018+0000 7f3f0812b640 0 [rbd_support DEBUG root] entering CreateSnapshotRequests.get_ioctx: 2//10979b31b5a7 2023-10-21T04:14:00.018+0000 7f3f0812b640 0 [rbd_support DEBUG root] CreateSnapshotRequests.get_ioctx acquired lock: 2//10979b31b5a7 2023-10-21T04:14:00.018+0000 7f3f0812b640 0 [rbd_support DEBUG root] CreateSnapshotRequests.get_ioctx opening ioctx: 2//10979b31b5a7 2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: create_ioctx calling pool_get_name() 2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: pool_get_name calling wait_for_osdmap() 2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: wait_for_osdmap entering 2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: wait_for_osdmap exiting without needing map 2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: pool_get_name calling objecter->with_osdmap() 2023-10-21T04:14:00.018+0000 7f3f0812b640 10 librados: pool_get_name exiting with r=0
MirrorSnapshotSchedulerHandler's run thread, 7f3f0812b640 tries to rados.open_ioctx() and doesn't proceed further in CreateSnapshotRequests.get_ioctx(). No more log messages are observed from 7f3f0812b640. The above seems to point to an issue in librados/objecter code.
Source: /a/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433370/remote/smithi146/log/ceph-mgr.x.log.gz
http://pulpito.front.sepia.ceph.com/rraja-2023-10-20_15:15:00-rbd:cli-ajarr-wip-extra-debug-62994-oct-19-distro-default-smithi/7433370/
Relevant code snippets from pybind/mgr/rbd_support/mirror_snapshot_schedule.py,
class CreateSnapshotRequests: ... def open_image(self, image_spec: ImageSpec) -> None: pool_id, namespace, image_id = image_spec self.log.debug("CreateSnapshotRequests.open_image: {}/{}/{}".format( pool_id, namespace, image_id)) try: ioctx = self.get_ioctx(image_spec) def cb(comp: rados.Completion, image: rbd.Image) -> None: self.handle_open_image(image_spec, comp, image) self.log.debug("calling aio_open_image CreateSnapshotRequests.open_image: {}/{}/{}".format( pool_id, namespace, image_id)) rbd.RBD().aio_open_image(cb, ioctx, image_id=image_id) self.log.debug("exiting CreateSnapshotRequests.open_image: {}/{}/{}".format( pool_id, namespace, image_id)) except Exception as e: self.log.error( "exception when opening {}/{}/{}: {}".format( pool_id, namespace, image_id, e)) self.finish(image_spec) ... def get_ioctx(self, image_spec: ImageSpec) -> rados.Ioctx: pool_id, namespace, image_id = image_spec nspec = (pool_id, namespace) self.log.debug("entering CreateSnapshotRequests.get_ioctx: {}/{}/{}".format( pool_id, namespace, image_id)) with self.lock: self.log.debug("CreateSnapshotRequests.get_ioctx acquired lock: {}/{}/{}".format( pool_id, namespace, image_id)) ioctx, images = self.ioctxs.get(nspec, (None, None)) if not ioctx: self.log.debug("CreateSnapshotRequests.get_ioctx opening ioctx: {}/{}/{}".format( pool_id, namespace, image_id)) ioctx = self.rados.open_ioctx2(int(pool_id)) self.log.debug("CreateSnapshotRequests.get_ioctx setting namespace: {}/{}/{}".format( pool_id, namespace, image_id)) ioctx.set_namespace(namespace) images = set() self.ioctxs[nspec] = (ioctx, images) assert images is not None images.add(image_spec) self.log.debug("exiting CreateSnapshotRequests.get_ioctx: {}/{}/{} and released lock".format( pool_id, namespace, image_id)) return ioctx
Updated by Ilya Dryomov 6 months ago
- Status changed from New to In Progress
- Backport set to pacific,quincy,reef
Updated by Ramana Raja 6 months ago
Examined the MirrorSnapshotScheduleHandler's run thread using gdb (with python extensions `sudo dnf debuginfo-install python3`) in vstart environment:
$ gdb attach <ceph-mgr-pid> ... (gdb) thread apply all py-bt # identified that thread 91 was the MirrorSnapshotScheduleHandler's run thread from the python backtraces (gdb) thread 91 [Switching to thread 91 (Thread 0x7f50b8100640 (LWP 166305))] #0 __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5619cd688380) at futex-internal.c:57 57 return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, expected, (gdb) py-list 47 def __del__(self) -> None: 48 self.log.debug("{} CreateSnapshotRequests.wait_for_pending: __del__".format(repr(self))) 49 self.wait_for_pending() 50 51 def wait_for_pending(self) -> None: >52 with self.lock: 53 self.log.debug("CreateSnapshotRequests.wait_for_pending: acquired lock") 54 while self.pending: 55 self.log.debug( 56 "CreateSnapshotRequests.wait_for_pending: " 57 "{} images".format(len(self.pending))) (gdb) py-bt Traceback (most recent call first): <built-in method __enter__ of _thread.lock object at remote 0x7f50d21c0d00> File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 52, in wait_for_pending with self.lock: File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 49, in __del__ self.wait_for_pending() Garbage-collecting File "/usr/lib64/python3.10/logging/__init__.py", line 819, in filter for f in self.filters: File "/usr/lib64/python3.10/logging/__init__.py", line 964, in handle rv = self.filter(record) File "/usr/lib64/python3.10/logging/__init__.py", line 1696, in callHandlers hdlr.handle(record) File "/usr/lib64/python3.10/logging/__init__.py", line 1634, in handle self.callHandlers(record) File "/usr/lib64/python3.10/logging/__init__.py", line 1624, in _log self.handle(record) File "/usr/lib64/python3.10/logging/__init__.py", line 1465, in debug self._log(DEBUG, msg, args, **kwargs) File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 328, in get_ioctx self.log.debug("{} CreateSnapshotRequests.get_ioctx acquired lock: {}/{}/{}".format( File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 98, in open_image ioctx = self.get_ioctx(image_spec) File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 87, in add self.open_image(image_spec) File "/home/rraja/git/ceph/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py", line 407, in run self.create_snapshot_requests.add(pool_id, namespace, image_id) File "/usr/lib64/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/usr/lib64/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib64/python3.10/threading.py", line 973, in _bootstrap self._bootstrap_inner()
Observed a recursive deadlock in the MirrorSnapshotSchedulerHandler's run thread. Ilya identified the root cause of the deadlock. The thread acquired a CreateSnapshotRequests's class variable lock (shared across all CreateSnapshotRequests instances) in CreateSnapshotRequests.get_ioctx(), and then attempted to garbage collect another CreateSnapshotRequest instance. When garbage collecting, CreateSnapshotRequests.__del__() was called, which in turn called CreateSnapshotRequests.wait_for_pending(). Here the thread expectedly failed to acquire the CreateSnapshotRequests's class variable lock that the thread already acquired when it called the other CreateSnapshotRequests's get_ioctx() earlier.
2023-10-26T17:54:00.002-0400 7f50b8100640 0 [rbd_support DEBUG root] CreateSnapshotRequests.open_image: 3//15032b4e30aa ... 2023-10-26T17:54:00.002-0400 7f50b8100640 0 [rbd_support DEBUG root] <rbd_support.mirror_snapshot_schedule.CreateSnapshotRequests object at 0x7f50b89f0d30> entering CreateSnapshotRequests.get_ioctx: 3//15032b4e30aa ... 2023-10-26T17:54:00.003-0400 7f50b8100640 0 [rbd_support DEBUG root] <rbd_support.mirror_snapshot_schedule.CreateSnapshotRequests object at 0x7f50b89f3370> CreateSnapshotRequests.wait_for_pending: __del__ ... 2023-10-26T17:54:00.006-0400 7f50a68dd640 0 [rbd_support DEBUG root] CreateSnapshotRequests.handle_open_image 3//14fdd5b36d44: r=-108 2023-10-26T17:54:00.006-0400 7f50a68dd640 0 [rbd_support ERROR root] error when opening 3//14fdd5b36d44: -108 2023-10-26T17:54:00.006-0400 7f50a68dd640 0 [rbd_support DEBUG root] CreateSnapshotRequests.finish: 3//14fdd5b36d44 2023-10-26T17:54:00.006-0400 7f50a68dd640 0 [rbd_support DEBUG root] <rbd_support.mirror_snapshot_schedule.CreateSnapshotRequests object at 0x7f50b89f0d30> entering CreateSnapshotRequests.put_ioctx: 3//14fdd5b36d44
The above ceph-mgr log excerpt shows that in the MirrorSnapshotScheduleHandler's run thread, the get_ioctx() and the subsequent del() belong to different CreateSnapshotRequests instances. Please note that extra logging was added in mirror_snapshot_schedule.py to generate these logs.
diff --git a/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py b/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py index 13242388282..f7567a03be3 100644 --- a/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py +++ b/src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py @@ -45,6 +45,7 @@ class CreateSnapshotRequests: self.ioctxs: Dict[Tuple[str, str], Tuple[rados.Ioctx, Set[ImageSpec]]] = {} def __del__(self) -> None: + self.log.debug("{} CreateSnapshotRequests.wait_for_pending: __del__".format(repr(self))) self.wait_for_pending() def wait_for_pending(self) -> None: @@ -320,44 +321,44 @@ class CreateSnapshotRequests: def get_ioctx(self, image_spec: ImageSpec) -> rados.Ioctx: pool_id, namespace, image_id = image_spec nspec = (pool_id, namespace) - self.log.debug("entering CreateSnapshotRequests.get_ioctx: {}/{}/{}".format( - pool_id, namespace, image_id)) + self.log.debug("{} entering CreateSnapshotRequests.get_ioctx: {}/{}/{}".format( + repr(self), pool_id, namespace, image_id))
Updated by Ramana Raja 6 months ago
Converted the CreateSnapshotRequests's lock from a class variable to an instance variable, and tested the change through teuthology. All 20 jobs of rbd_support module recovery on blocklisting integration test passed.
https://github.com/ceph/ceph-ci/commits/wip-ajarr-fix-62994-oct-26-2
https://pulpito.ceph.com/rraja-2023-10-27_00:00:53-rbd:cli-wip-ajarr-fix-62994-oct-26-2-distro-default-smithi/
Also tried another approach where I got rid of CreateSnapshotRequests's del() method, and tested the change. All 20 jobs of rbd_support module recovery on blocklisting integration test passed again.
https://github.com/ceph/ceph-ci/commits/wip-ajarr-fix-62994-oct-25
https://pulpito.ceph.com/rraja-2023-10-26_03:38:28-rbd:cli-wip-ajarr-fix-62994-oct-25-distro-default-smithi/
The two types of hangs noted in https://tracker.ceph.com/issues/62994#note-4 were not observed with the above fixes. Also, the cause of two hangs can be explained by the recursive deadlock occurring in MirrorSnapshotSchedulerHandler's run thread. See https://tracker.ceph.com/issues/62994#note-6 .
Updated by Ramana Raja 6 months ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 54251
Updated by Ilya Dryomov 6 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 6 months ago
- Copied to Backport #63382: pacific: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread added
Updated by Backport Bot 6 months ago
- Copied to Backport #63383: quincy: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread added
Updated by Backport Bot 6 months ago
- Copied to Backport #63384: reef: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread added
Updated by Backport Bot 2 months ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".