Bug #61607
closedhang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting
0%
Description
There appears to be another hang which occurs when the exclusive lock acquisition races with blocklisting:
2023-05-29T18:51:00.008+0000 7fb11add0640 20 librbd::api::Mirror: image_snapshot_create: ictx=0x55fa61151000 [...] 2023-05-29T18:51:00.008+0000 7fb11add0640 5 librbd::Operations: 0x55fa5e73c630 snap_create: snap_name=.mirror.primary.0d8be64e-956a-42be-817c-b1945343b921.cf193596-7096-4d58-bd60-a6727703a3c5 2023-05-29T18:51:00.008+0000 7fb11add0640 20 librbd::ManagedLock: 0x55fa59570eb8 is_lock_owner: =0 2023-05-29T18:51:00.008+0000 7fb11add0640 20 librbd::Operations: send_acquire_exclusive_lock 2023-05-29T18:51:00.008+0000 7fb11add0640 10 librbd::ManagedLock: 0x55fa59570eb8 try_acquire_lock: 2023-05-29T18:51:00.008+0000 7fb11add0640 10 librbd::ManagedLock: 0x55fa59570eb8 send_acquire_lock: 2023-05-29T18:51:00.008+0000 7fb11b5d1640 10 librbd::ManagedLock: 0x55fa59570eb8 handle_pre_acquire_lock: r=0 2023-05-29T18:51:00.008+0000 7fb11b5d1640 10 librbd::managed_lock::AcquireRequest: 0x55fa5fd5ec30 send_get_locker: [...] 2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::managed_lock::BreakRequest: 0x55fa5ef96b40 handle_get_watchers: lock owner is still alive 2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::managed_lock::BreakRequest: 0x55fa5ef96b40 finish: r=-11 2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::managed_lock::AcquireRequest: 0x55fa5e81a960 handle_break_lock: r=-11 2023-05-29T18:51:00.113+0000 7fb11b5d1640 5 librbd::managed_lock::AcquireRequest: 0x55fa5e81a960 handle_break_lock: lock owner is still alive 2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::ManagedLock: 0x55fa59570eb8 handle_acquire_lock: r=-11 2023-05-29T18:51:00.113+0000 7fb11b5d1640 5 librbd::ManagedLock: 0x55fa59570eb8 handle_acquire_lock: unable to acquire exclusive lock 2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::ExclusiveLock: 0x55fa59570ea0 post_acquire_lock_handler: r=-11 2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::ImageState: 0x55fa5c237780 handle_prepare_lock_complete 2023-05-29T18:51:00.113+0000 7fb11b5d1640 20 librbd::ManagedLock: 0x55fa59570eb8 is_lock_owner: =0 2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::ImageWatcher: 0x55fa64aa8f00 notify request lock 2023-05-29T18:51:00.113+0000 7fb11b5d1640 20 librbd::image_watcher::NotifyLockOwner: 0x55fa5c23d200 send_notify 2023-05-29T18:51:00.113+0000 7fb11b5d1640 20 librbd::watcher::Notifier: 0x55fa64aa8f80 notify: pending=1 2023-05-29T18:51:00.113+0000 7fb11b5d1640 10 librbd::ManagedLock: 0x55fa59570eb8 handle_post_acquire_lock: r=-125 2023-05-29T18:51:00.113+0000 7fb11add0640 5 librbd::Watcher: 0x55fa64aa8f00 notifications_blocked: blocked=0 2023-05-29T18:51:00.113+0000 7fb11add0640 10 librbd::Watcher::C_NotifyAck 0x55fa5c716b40 C_NotifyAck: id=616091583780071, handle=94533739825408 2023-05-29T18:51:00.113+0000 7fb11add0640 10 librbd::ImageWatcher: 0x55fa64aa8f00 exclusive lock requested 2023-05-29T18:51:00.113+0000 7fb11add0640 10 librbd::Watcher::C_NotifyAck 0x55fa5c716b40 finish: r=0
Here it discovered that the exclusive lock was held by someone else and sent a request to that peer to release it. The state machine got "canceled", as expected.
Then blocklisting is detected:
2023-05-29T18:51:00.457+0000 7fb11add0640 -1 librbd::ImageWatcher: 0x55fa64aa8f00 image watch failed: 94533739825408, (107) Transport endpoint is not connected 2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::ImageWatcher: 0x55fa64aa8f00 current lock owner: [0,0] 2023-05-29T18:51:00.457+0000 7fb11add0640 -1 librbd::Watcher: 0x55fa64aa8f00 handle_error: handle=94533739825408: (107) Transport endpoint is not connected 2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::Watcher: 0x55fa64aa8f00 rewatch: 2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::watcher::RewatchRequest: 0x55fa5fb07810 unwatch 2023-05-29T18:51:00.457+0000 7fb11b5d1640 10 librbd::watcher::RewatchRequest: 0x55fa5fb07810 handle_unwatch r=-108 2023-05-29T18:51:00.457+0000 7fb11b5d1640 -1 librbd::watcher::RewatchRequest: 0x55fa5fb07810 handle_unwatch client blocklisted 2023-05-29T18:51:00.457+0000 7fb11b5d1640 10 librbd::watcher::RewatchRequest: 0x55fa5fb07810 finish r=-108 2023-05-29T18:51:00.457+0000 7fb11b5d1640 10 librbd::Watcher: 0x55fa64aa8f00 handle_rewatch: r=-108 2023-05-29T18:51:00.457+0000 7fb11b5d1640 -1 librbd::Watcher: 0x55fa64aa8f00 handle_rewatch: client blocklisted 2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::Watcher: 0x55fa64aa8f00 handle_rewatch_callback: r=-108 2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::ImageWatcher: 0x55fa64aa8f00 handle_rewatch_complete: r=-108 2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::ManagedLock: 0x55fa59570eb8 reacquire_lock: 2023-05-29T18:51:00.457+0000 7fb11add0640 10 librbd::ImageWatcher: 0x55fa64aa8f00 image header updated 2023-05-29T18:51:00.457+0000 7fb11add0640 20 librbd::ImageState: 0x55fa5c237780 handle_update_notification: refresh_seq = 1, last_refresh = 0
ImageWatcher::handle_rewatch_complete() ignores all errors, calls into the state machine and even attempts to arrange for the image header to be refreshed:
template <typename I> void ImageWatcher<I>::handle_rewatch_complete(int r) { CephContext *cct = m_image_ctx.cct; ldout(cct, 10) << this << " " << __func__ << ": r=" << r << dendl; { std::shared_lock owner_locker{m_image_ctx.owner_lock}; if (m_image_ctx.exclusive_lock != nullptr) { // update the lock cookie with the new watch handle m_image_ctx.exclusive_lock->reacquire_lock(nullptr); } } // image might have been updated while we didn't have active watch handle_payload(HeaderUpdatePayload(), nullptr); }
The call into the state machine is recorded but gets dropped because STATE_WAITING_FOR_LOCK is a "transition" state:
template <typename I> void ManagedLock<I>::reacquire_lock(Context *on_reacquired) { [...] } else if (!is_state_shutdown() && (m_state == STATE_LOCKED || m_state == STATE_ACQUIRING || m_state == STATE_POST_ACQUIRING || m_state == STATE_WAITING_FOR_LOCK)) { // interlock the lock operation with other state ops ldout(m_cct, 10) << dendl; execute_action(ACTION_REACQUIRE_LOCK, on_reacquired); return; }
template <typename I> void ManagedLock<I>::execute_action(Action action, Context *ctx) { ceph_assert(ceph_mutex_is_locked(m_lock)); append_context(action, ctx); if (!is_transition_state()) { execute_next_action(); } }
Finally, the request to release the exclusive lock returns (or at least this is when the peer's reply is processed -- messenger logs weren't enabled so it's hard to tell when the message got received):
2023-05-29T18:51:00.611+0000 7fb11b5d1640 20 librbd::watcher::Notifier: 0x55fa64aa8f80 handle_notify: r=0 2023-05-29T18:51:00.611+0000 7fb11b5d1640 20 librbd::watcher::Notifier: 0x55fa64aa8f80 handle_notify: pending=0 2023-05-29T18:51:00.611+0000 7fb11b5d1640 20 librbd::image_watcher::NotifyLockOwner: 0x55fa5c23d200 handle_notify: r=0 2023-05-29T18:51:00.611+0000 7fb11b5d1640 20 librbd::image_watcher::NotifyLockOwner: 0x55fa5c23d200 handle_notify client responded with r=0 2023-05-29T18:51:00.611+0000 7fb11b5d1640 15 librbd::ImageWatcher: 0x55fa64aa8f00 will retry in 10 seconds
But nothing is retried -- an attempt to schedule a retry gets ignored too because the watch isn't registered (it can't be due to blocklisting):
template <typename I> void ImageWatcher<I>::schedule_request_lock(bool use_timer, int timer_delay) { [...] if (this->is_registered(this->m_watch_lock)) { ldout(m_image_ctx.cct, 15) << this << " requesting exclusive lock" << dendl; [...] } }
More than one thing here seems weird to me, but the most obvious is ImageWatcher::handle_rewatch_complete() ignoring all errors, including EBLOCKLISTED.
Files
Updated by Ilya Dryomov 12 months ago
- Related to Bug #59115: [rbd-mirror] unlink_peer gets stuck during blocklist added
Updated by Ramana Raja 10 months ago
- File blocklist.sh blocklist.sh added
Reproduced the issue consistently using the following steps:
- Setup two Ceph clusters using mstart.sh
- Create 20 RBD 1GiB images; mapped the images using kernel client; formatted devices as ext4 and mounted them.
- Ran fio workload on the mounts continuously
`fio --name=fiotest --filename=$dir/test1 --rw=randrw --bs=4K --direct=1 --ioengine=libaio --size=800M --iodepth=2`
- Scheduled mirror snapshots on all the 20 images once every 3 minutes using rbd_support mgr module
`for i in {1..20}; do ./bin/rbd --cluster site-a mirror snapshot schedule add --pool data --image image$i 3m; done`
- blocklisted the rbd_support module's RADOS client repeatedly (close to 10 seconds after recovery) using the "blocklist.sh" script attached.
Note: the script is based on Vasishta Shastry's blocklisting script
Within ~2 hours of running the blocklisting script, the rbd_support mgr module failed to recover as one of the mirror_snapshot_scheduler handler's create mirror snapshot request failed to error out. Observed the same issue in the ceph-mgr log, where the EBLOCKLISTED error is not getting properly propagated in librbd's "ImageWatcher::handle_rewatch_complete()" and "ManagedLock::reacquire_lock()".
Updated by Ramana Raja 9 months ago
Ramana Raja wrote:
Reproduced the issue consistently using the following steps:
- Setup two Ceph clusters using mstart.sh- Create 20 RBD 1GiB images; mapped the images using kernel client; formatted devices as ext4 and mounted them.
- Ran fio workload on the mounts continuously
`fio --name=fiotest --filename=$dir/test1 --rw=randrw --bs=4K --direct=1 --ioengine=libaio --size=800M --iodepth=2`- Scheduled mirror snapshots on all the 20 images once every 3 minutes using rbd_support mgr module
`for i in {1..20}; do ./bin/rbd --cluster site-a mirror snapshot schedule add --pool data --image image$i 3m; done`- blocklisted the rbd_support module's RADOS client repeatedly (close to 10 seconds after recovery) using the "blocklist.sh" script attached.
Note: the script is based on Vasishta Shastry's blocklisting scriptWithin 90 minutes of running the blocklisting script, the rbd_support mgr module failed to recover as one of the mirror_snapshot_scheduler handler's create mirror snapshot request failed to error out. Observed the same issue in the ceph-mgr log, where the EBLOCKLISTED error is not getting properly propagated in librbd's "ImageWatcher::handle_rewatch_complete()" and "ManagedLock::reacquire_lock()".
You can make the following code change to make sure that every time an exclusive lock is requested, the ExclusiveLock state machine is stalled waiting for lock as the client gets blocklisted by "blocklist.sh" script.
diff --git a/src/librbd/ImageWatcher.cc b/src/librbd/ImageWatcher.cc
index 4a1acd5e174..223265bfdce 100644
--- a/src/librbd/ImageWatcher.cc
+++ b/src/librbd/ImageWatcher.cc@ -611,7 +611,7
@ void ImageWatcher<I>::notify_request_lock() {
}
ldout(m_image_ctx.cct, 10) << this << " notify request lock" << dendl;
-
+ sleep(15);
notify_lock_owner(new RequestLockPayload(get_client_id(), false),
create_context_callback<
ImageWatcher, &ImageWatcher<I>::handle_request_lock>(this));
Updated by Ilya Dryomov 9 months ago
- Status changed from In Progress to Fix Under Review
- Backport set to pacific,quincy,reef
Updated by Ilya Dryomov 9 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 9 months ago
- Copied to Backport #62685: quincy: hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting added
Updated by Backport Bot 9 months ago
- Copied to Backport #62686: pacific: hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting added
Updated by Backport Bot 9 months ago
- Copied to Backport #62687: reef: hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting added
Updated by Ramana Raja 8 months ago
- Related to Bug #63009: another hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting added
Updated by Ilya Dryomov 7 months ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".