Bug #61607: hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting - rbd - Ceph

Actions

Copy link

Bug #61607

closed

hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting

Added by Ilya Dryomov 12 months ago. Updated 7 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

Ramana Raja

Target version:

% Done:

Source:

Tags:

backport_processed

Backport:

pacific,quincy,reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

52990

Crash signature (v1):

Crash signature (v2):

Description

There appears to be another hang which occurs when the exclusive lock acquisition races with blocklisting:

2023-05-29T18:51:00.008+0000 7fb11add0640 20 librbd::api::Mirror: image_snapshot_create: ictx=0x55fa61151000
[...]
2023-05-29T18:51:00.008+0000 7fb11add0640  5 librbd::Operations: 0x55fa5e73c630 snap_create: snap_name=.mirror.primary.0d8be64e-956a-42be-817c-b1945343b921.cf193596-7096-4d58-bd60-a6727703a3c5
2023-05-29T18:51:00.008+0000 7fb11add0640 20 librbd::ManagedLock: 0x55fa59570eb8 is_lock_owner: =0
2023-05-29T18:51:00.008+0000 7fb11add0640 20 librbd::Operations: send_acquire_exclusive_lock
2023-05-29T18:51:00.008+0000 7fb11add0640 10 librbd::ManagedLock: 0x55fa59570eb8 try_acquire_lock:
2023-05-29T18:51:00.008+0000 7fb11add0640 10 librbd::ManagedLock: 0x55fa59570eb8 send_acquire_lock:
2023-05-29T18:51:00.008+0000 7fb11b5d1640 10 librbd::ManagedLock: 0x55fa59570eb8 handle_pre_acquire_lock: r=0
2023-05-29T18:51:00.008+0000 7fb11b5d1640 10 librbd::managed_lock::AcquireRequest: 0x55fa5fd5ec30 send_get_locker:
[...]
2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::managed_lock::BreakRequest: 0x55fa5ef96b40 handle_get_watchers: lock owner is still alive
2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::managed_lock::BreakRequest: 0x55fa5ef96b40 finish: r=-11
2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::managed_lock::AcquireRequest: 0x55fa5e81a960 handle_break_lock: r=-11
2023-05-29T18:51:00.113+0000 7fb11b5d1640  5 librbd::managed_lock::AcquireRequest: 0x55fa5e81a960 handle_break_lock: lock owner is still alive
2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::ManagedLock: 0x55fa59570eb8 handle_acquire_lock: r=-11
2023-05-29T18:51:00.113+0000 7fb11b5d1640  5 librbd::ManagedLock: 0x55fa59570eb8 handle_acquire_lock: unable to acquire exclusive lock
2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::ExclusiveLock: 0x55fa59570ea0 post_acquire_lock_handler: r=-11
2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::ImageState: 0x55fa5c237780 handle_prepare_lock_complete
2023-05-29T18:51:00.113+0000 7fb11b5d1640 20 librbd::ManagedLock: 0x55fa59570eb8 is_lock_owner: =0
2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::ImageWatcher: 0x55fa64aa8f00 notify request lock
2023-05-29T18:51:00.113+0000 7fb11b5d1640 20 librbd::image_watcher::NotifyLockOwner: 0x55fa5c23d200 send_notify
2023-05-29T18:51:00.113+0000 7fb11b5d1640 20 librbd::watcher::Notifier: 0x55fa64aa8f80 notify: pending=1
2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::ManagedLock: 0x55fa59570eb8 handle_post_acquire_lock: r=-125
2023-05-29T18:51:00.113+0000 7fb11add0640  5 librbd::Watcher: 0x55fa64aa8f00 notifications_blocked: blocked=0
2023-05-29T18:51:00.113+0000 7fb11add0640 10 librbd::Watcher::C_NotifyAck 0x55fa5c716b40 C_NotifyAck: id=616091583780071, handle=94533739825408
2023-05-29T18:51:00.113+0000 7fb11add0640 10 librbd::ImageWatcher: 0x55fa64aa8f00 exclusive lock requested
2023-05-29T18:51:00.113+0000 7fb11add0640 10 librbd::Watcher::C_NotifyAck 0x55fa5c716b40 finish: r=0

Here it discovered that the exclusive lock was held by someone else and sent a request to that peer to release it. The state machine got "canceled", as expected.

Then blocklisting is detected:

2023-05-29T18:51:00.457+0000 7fb11add0640 -1 librbd::ImageWatcher: 0x55fa64aa8f00 image watch failed: 94533739825408, (107) Transport endpoint is not connected
2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::ImageWatcher: 0x55fa64aa8f00 current lock owner: [0,0]
2023-05-29T18:51:00.457+0000 7fb11add0640 -1 librbd::Watcher: 0x55fa64aa8f00 handle_error: handle=94533739825408: (107) Transport endpoint is not connected
2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::Watcher: 0x55fa64aa8f00 rewatch:
2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::watcher::RewatchRequest: 0x55fa5fb07810 unwatch
2023-05-29T18:51:00.457+0000 7fb11b5d1640 10 librbd::watcher::RewatchRequest: 0x55fa5fb07810 handle_unwatch r=-108
2023-05-29T18:51:00.457+0000 7fb11b5d1640 -1 librbd::watcher::RewatchRequest: 0x55fa5fb07810 handle_unwatch client blocklisted
2023-05-29T18:51:00.457+0000 7fb11b5d1640 10 librbd::watcher::RewatchRequest: 0x55fa5fb07810 finish r=-108
2023-05-29T18:51:00.457+0000 7fb11b5d1640 10 librbd::Watcher: 0x55fa64aa8f00 handle_rewatch: r=-108
2023-05-29T18:51:00.457+0000 7fb11b5d1640 -1 librbd::Watcher: 0x55fa64aa8f00 handle_rewatch: client blocklisted
2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::Watcher: 0x55fa64aa8f00 handle_rewatch_callback: r=-108
2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::ImageWatcher: 0x55fa64aa8f00 handle_rewatch_complete: r=-108
2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::ManagedLock: 0x55fa59570eb8 reacquire_lock:
2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::ImageWatcher: 0x55fa64aa8f00 image header updated
2023-05-29T18:51:00.457+0000 7fb11add0640 20 librbd::ImageState: 0x55fa5c237780 handle_update_notification: refresh_seq = 1, last_refresh = 0

ImageWatcher::handle_rewatch_complete() ignores all errors, calls into the state machine and even attempts to arrange for the image header to be refreshed:

template <typename I>
void ImageWatcher<I>::handle_rewatch_complete(int r) {
  CephContext *cct = m_image_ctx.cct;
  ldout(cct, 10) << this << " " << __func__ << ": r=" << r << dendl;

  {
    std::shared_lock owner_locker{m_image_ctx.owner_lock};
    if (m_image_ctx.exclusive_lock != nullptr) {
      // update the lock cookie with the new watch handle
      m_image_ctx.exclusive_lock->reacquire_lock(nullptr);
    }
  }

  // image might have been updated while we didn't have active watch
  handle_payload(HeaderUpdatePayload(), nullptr);
}

The call into the state machine is recorded but gets dropped because STATE_WAITING_FOR_LOCK is a "transition" state:

template <typename I>
void ManagedLock<I>::reacquire_lock(Context *on_reacquired) {
[...]
    } else if (!is_state_shutdown() &&
               (m_state == STATE_LOCKED ||
                m_state == STATE_ACQUIRING ||
                m_state == STATE_POST_ACQUIRING ||
                m_state == STATE_WAITING_FOR_LOCK)) {
      // interlock the lock operation with other state ops
      ldout(m_cct, 10) << dendl;
      execute_action(ACTION_REACQUIRE_LOCK, on_reacquired);
      return;
    }

template <typename I>
void ManagedLock<I>::execute_action(Action action, Context *ctx) {
  ceph_assert(ceph_mutex_is_locked(m_lock));

  append_context(action, ctx);
  if (!is_transition_state()) {
    execute_next_action();
  }
}

Finally, the request to release the exclusive lock returns (or at least this is when the peer's reply is processed -- messenger logs weren't enabled so it's hard to tell when the message got received):

2023-05-29T18:51:00.611+0000 7fb11b5d1640 20 librbd::watcher::Notifier: 0x55fa64aa8f80 handle_notify: r=0
2023-05-29T18:51:00.611+0000 7fb11b5d1640 20 librbd::watcher::Notifier: 0x55fa64aa8f80 handle_notify: pending=0
2023-05-29T18:51:00.611+0000 7fb11b5d1640 20 librbd::image_watcher::NotifyLockOwner: 0x55fa5c23d200 handle_notify: r=0
2023-05-29T18:51:00.611+0000 7fb11b5d1640 20 librbd::image_watcher::NotifyLockOwner: 0x55fa5c23d200 handle_notify client responded with r=0
2023-05-29T18:51:00.611+0000 7fb11b5d1640 15 librbd::ImageWatcher: 0x55fa64aa8f00 will retry in 10 seconds

But nothing is retried -- an attempt to schedule a retry gets ignored too because the watch isn't registered (it can't be due to blocklisting):

template <typename I>
void ImageWatcher<I>::schedule_request_lock(bool use_timer, int timer_delay) {
[...]
  if (this->is_registered(this->m_watch_lock)) {
    ldout(m_image_ctx.cct, 15) << this << " requesting exclusive lock" << dendl;
    [...]
  }
}

More than one thing here seems weird to me, but the most obvious is ImageWatcher::handle_rewatch_complete() ignoring all errors, including EBLOCKLISTED.

Files

blocklist.sh (1.03 KB) blocklist.sh

Ramana Raja, 08/02/2023 01:54 PM

Related issues 5 (0 open — 5 closed)

Actions

Copy link

Updated by Ilya Dryomov 12 months ago

Related to Bug #59115: [rbd-mirror] unlink_peer gets stuck during blocklist added

Actions

Copy link

Updated by Ramana Raja 10 months ago

File blocklist.sh blocklist.sh added

Reproduced the issue consistently using the following steps:
- Setup two Ceph clusters using mstart.sh

- Create 20 RBD 1GiB images; mapped the images using kernel client; formatted devices as ext4 and mounted them.

- Ran fio workload on the mounts continuously
`fio --name=fiotest --filename=$dir/test1 --rw=randrw --bs=4K --direct=1 --ioengine=libaio --size=800M --iodepth=2`

- Scheduled mirror snapshots on all the 20 images once every 3 minutes using rbd_support mgr module
`for i in {1..20}; do ./bin/rbd --cluster site-a mirror snapshot schedule add --pool data --image image$i 3m; done`

- blocklisted the rbd_support module's RADOS client repeatedly (close to 10 seconds after recovery) using the "blocklist.sh" script attached.
Note: the script is based on Vasishta Shastry's blocklisting script

Within ~2 hours of running the blocklisting script, the rbd_support mgr module failed to recover as one of the mirror_snapshot_scheduler handler's create mirror snapshot request failed to error out. Observed the same issue in the ceph-mgr log, where the EBLOCKLISTED error is not getting properly propagated in librbd's "ImageWatcher::handle_rewatch_complete()" and "ManagedLock::reacquire_lock()".

Actions

Copy link

Updated by Ramana Raja 9 months ago

Pull request ID set to 52990

Actions

Copy link

Updated by Ramana Raja 9 months ago

Ramana Raja wrote:

Reproduced the issue consistently using the following steps:
- Setup two Ceph clusters using mstart.sh

- Create 20 RBD 1GiB images; mapped the images using kernel client; formatted devices as ext4 and mounted them.

- Ran fio workload on the mounts continuously
`fio --name=fiotest --filename=$dir/test1 --rw=randrw --bs=4K --direct=1 --ioengine=libaio --size=800M --iodepth=2`

- Scheduled mirror snapshots on all the 20 images once every 3 minutes using rbd_support mgr module
`for i in {1..20}; do ./bin/rbd --cluster site-a mirror snapshot schedule add --pool data --image image$i 3m; done`

- blocklisted the rbd_support module's RADOS client repeatedly (close to 10 seconds after recovery) using the "blocklist.sh" script attached.
Note: the script is based on Vasishta Shastry's blocklisting script

Within 90 minutes of running the blocklisting script, the rbd_support mgr module failed to recover as one of the mirror_snapshot_scheduler handler's create mirror snapshot request failed to error out. Observed the same issue in the ceph-mgr log, where the EBLOCKLISTED error is not getting properly propagated in librbd's "ImageWatcher::handle_rewatch_complete()" and "ManagedLock::reacquire_lock()".

You can make the following code change to make sure that every time an exclusive lock is requested, the ExclusiveLock state machine is stalled waiting for lock as the client gets blocklisted by "blocklist.sh" script.

diff --git a/src/librbd/ImageWatcher.cc b/src/librbd/ImageWatcher.cc
index 4a1acd5e174..223265bfdce 100644
--- a/src/librbd/ImageWatcher.cc
+++ b/src/librbd/ImageWatcher.cc
@ -611,7 +611,7 @ void ImageWatcher<I>::notify_request_lock() {
}

ldout(m_image_ctx.cct, 10) << this << " notify request lock" << dendl;
-
+  sleep(15);
   notify_lock_owner(new RequestLockPayload(get_client_id(), false),
       create_context_callback<
         ImageWatcher, &ImageWatcher&lt;I&gt;::handle_request_lock>(this));

Actions

Copy link

Updated by Ilya Dryomov 9 months ago

Status changed from In Progress to Fix Under Review
Backport set to pacific,quincy,reef

Actions

Copy link

Updated by Ilya Dryomov 9 months ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Backport Bot 9 months ago

Copied to Backport #62685: quincy: hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting added

Actions

Copy link

Updated by Backport Bot 9 months ago

Copied to Backport #62686: pacific: hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting added

Actions

Copy link

Updated by Backport Bot 9 months ago

Copied to Backport #62687: reef: hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting added

Actions

Copy link

#10

Updated by Backport Bot 9 months ago

Tags set to backport_processed

Actions

Copy link

#11

Updated by Ramana Raja 8 months ago

Related to Bug #63009: another hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting added

Actions

Copy link

#12

Updated by Ilya Dryomov 7 months ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #61607

hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting

Updated by Ilya Dryomov 12 months ago

Updated by Ramana Raja 10 months ago

Updated by Ramana Raja 9 months ago

Updated by Ramana Raja 9 months ago

Updated by Ilya Dryomov 9 months ago

Updated by Ilya Dryomov 9 months ago

Updated by Backport Bot 9 months ago

Updated by Backport Bot 9 months ago

Updated by Backport Bot 9 months ago

Updated by Backport Bot 9 months ago

Updated by Ramana Raja 8 months ago

Updated by Ilya Dryomov 7 months ago