Bug #62994: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread - rbd - Ceph

Actions

Copy link

Bug #62994

closed

mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread

Added by Ramana Raja 8 months ago. Updated 3 months ago.

Status:

Resolved

Priority:

High

Assignee:

Ramana Raja

Target version:

% Done:

Source:

Q/A

Tags:

backport_processed

Backport:

pacific,quincy,reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

54251

Crash signature (v1):

Crash signature (v2):

Description

Ran the integration test in https://github.com/ceph/ceph/pull/53535 that repeatedly blocklists the rbd_support module's RADOS client approximately every 10 seconds after the module recovers from previous blocklisting at http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/ . Observed 2 job failures,
- http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401648/
- http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401660/
where the rbd_support module didn't recover from blocklisting due to the following issue. The module's MirrorSnapshotScheduleHandler got stuck trying to wait for its run thread to terminate in its shutdown() method.
Excerpt from the mgr log at /a/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401648/remote/smithi099/log/ceph-mgr.x.log.gz in teuthology.

2023-09-23T07:18:32.518+0000 7fe0ea9fe640  0 [rbd_support ERROR root] TrashPurgeScheduleHandler: client blocklisted
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/rbd_support/trash_purge_schedule.py", line 46, in run
    refresh_delay = self.refresh_pools()
  File "/usr/share/ceph/mgr/rbd_support/trash_purge_schedule.py", line 95, in refresh_pools
    self.load_schedules()
  File "/usr/share/ceph/mgr/rbd_support/trash_purge_schedule.py", line 85, in load_schedules
    self.schedules.load()
  File "/usr/share/ceph/mgr/rbd_support/schedule.py", line 419, in load
    self.load_from_pool(ioctx, namespace_validator,
  File "/usr/share/ceph/mgr/rbd_support/schedule.py", line 442, in load_from_pool
    ioctx.operate_read_op(read_op, self.handler.SCHEDULE_OID)
  File "rados.pyx", line 3723, in rados.Ioctx.operate_read_op
rados.ConnectionShutdown: [errno 108] RADOS connection was shutdown (Failed to operate read op for oid rbd_trash_purge_schedule)
2023-09-23T07:18:32.518+0000 7fe0efa08640  0 [rbd_support INFO root] recovering from blocklisting
2023-09-23T07:18:32.518+0000 7fe0efa08640  0 [rbd_support INFO root] MirrorSnapshotScheduleHandler: shutting down
2023-09-23T07:18:32.522+0000 7fe0efa08640  0 [rbd_support DEBUG root] MirrorSnapshotScheduleHandler: joining thread

After this I don't see any logs from MirrorSnapshotScheduleHandler and TrashPurgeScheduleHandler. I only see ticks from PerfHandler and TaskHandler.

Related issues 5 (0 open — 5 closed)

Related to rbd - Bug #56724: [rbd_support] recover from RADOS instance blocklisting

Resolved

Ramana Raja

Actions

Related to rbd - Bug #62891: [test][rbd] test recovery of rbd_support module from repeated blocklisting of its client

Resolved

Ramana Raja

Actions

Copied to rbd - Backport #63382: pacific: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread

Resolved

Ramana Raja

Actions

Copied to rbd - Backport #63383: quincy: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread

Resolved

Ramana Raja

Actions

Copied to rbd - Backport #63384: reef: mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread

Resolved

Ramana Raja

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #62994

mgr/rbd_support: recovery from client blocklisting halts after MirrorSnapshotScheduleHandler tries to terminate its run thread

Updated by Ramana Raja 8 months ago

Updated by Ramana Raja 8 months ago

Updated by Ramana Raja 8 months ago

Updated by Ramana Raja 7 months ago

Updated by Ilya Dryomov 7 months ago

Updated by Ramana Raja 7 months ago

Updated by Ramana Raja 7 months ago

Updated by Ramana Raja 7 months ago

Updated by Ilya Dryomov 7 months ago

Updated by Backport Bot 7 months ago

Updated by Backport Bot 7 months ago

Updated by Backport Bot 7 months ago

Updated by Backport Bot 7 months ago

Updated by Backport Bot 3 months ago