Bug #63009
Updated by Ramana Raja 8 months ago
Yet another hang in librbd which occurs when exclusive lock acquisition races with blocklisting: <pre> 2023-09-23T07:35:01.560+0000 7f5568565700 0 [rbd_support DEBUG root] CreateSnapshotRequests.create_snapshot for 2//110bc13f2b0a ... 2023-09-23T07:35:01.561+0000 7f5568d66700 15 librbd::mirror::snapshot::CreatePrimaryRequest: 0x55a3ba0e45b0 create_snapshot: name=.mirror.primary.6d36faee-3986-4977-8c10-d2f5f4af7805.0de9900f-144e-42b1-b803-e03a8750b134, ns=[mirror state=primary, complete=0, mirror_peer_uuids=6e5d5e76-e90c-4f2a-a6a0-9a5984e5deba, clean_since_snap_id=head] 2023-09-23T07:35:01.561+0000 7f5568d66700 5 librbd::Operations: 0x55a3bb8338c0 snap_create: snap_name=.mirror.primary.6d36faee-3986-4977-8c10-d2f5f4af7805.0de9900f-144e-42b1-b803-e03a8750b134 2023-09-23T07:35:01.561+0000 7f5568d66700 20 librbd::ManagedLock: 0x55a3baa34838 is_lock_owner: 0 2023-09-23T07:35:01.561+0000 7f5568d66700 20 librbd::Operations: send_acquire_exclusive_lock 2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 try_acquire_lock: 2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 send_acquire_lock: ... 2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ExclusiveLock: 0x55a3baa34820 pre_acquire_lock_handler ... 2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 handle_pre_acquire_lock: r=0 ... 2023-09-23T07:35:01.591+0000 7f5568565700 10 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: r=-11 2023-09-23T07:35:01.591+0000 7f5568565700 5 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: unable to acquire exclusive lock </pre> After a few failed attempts to acquire exclusive lock, librbd tries to acquire exclusive lock again <pre> 2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: r=-11 2023-09-23T07:35:02.006+0000 7f5568d66700 5 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: unable to acquire exclusive lock 2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ExclusiveLock: 0x55a3baa34820 post_acquire_lock_handler: r=-11 2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ImageState: 0x55a3bada0300 handle_prepare_lock_complete 2023-09-23T07:35:02.006+0000 7f5568d66700 20 librbd::ManagedLock: 0x55a3baa34838 is_lock_owner: 0 2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ImageWatcher: 0x55a3ba806300 notify request lock 2023-09-23T07:35:02.006+0000 7f5568d66700 20 librbd::image_watcher::NotifyLockOwner: 0x55a3bada0380 send_notify 2023-09-23T07:35:02.006+0000 7f5568d66700 20 librbd::watcher::Notifier: 0x55a3ba806380 notify: pending=1 2023-09-23T07:35:02.006+0000 7f5568d66700 1 -- 172.21.15.140:0/1974081988 --> [v2:172.21.15.140:6816/155684008,v1:172.21.15.140:6817/155684008] -- osd_op(unknown.0.0:1268 2.3 2:cb0b96e0:::rbd_header.110bc13f2b0a:head [notify cookie 94161715406848 in=39b] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e83) v8 -- 0x55a3bb206000 con 0x55a3ba3c2000 2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 handle_post_acquire_lock: r=-125 </pre> Blocklisting is detected. <pre> 2023-09-23T07:35:02.185+0000 7f5568565700 10 librbd::ImageWatcher: 0x55a3ba806300 current lock owner: [0,0] 2023-09-23T07:35:02.185+0000 7f5568565700 -1 librbd::Watcher: 0x55a3ba806300 handle_error: handle=94161715361792: (107) Transport endpoint is not connected ... 2023-09-23T07:35:02.185+0000 7f5568565700 10 librbd::Watcher: 0x55a3ba806300 rewatch: 2023-09-23T07:35:02.185+0000 7f5568565700 10 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 unwatch 2023-09-23T07:35:02.185+0000 7f5568565700 1 -- 172.21.15.140:0/1974081988 --> [v2:172.21.15.140:6816/155684008,v1:172.21.15.140:6817/155684008] -- osd_op(unknown.0.0:1301 2.3 2:cb0b96e0:::rbd_header.110bc13f2b0a:head [watch unwatch cookie 94161715361792] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e84) v8 -- 0x55a3bb3a8e00 con 0x55a3ba3c2000 </pre> <pre> See ReWatchRequest::unwatch(), where watch_handle is set to zero and aio_unwatch(), asynchronous unwatch is called. Watcher is now considered unregistered but not blocklisted. <pre> void RewatchRequest::unwatch() { ceph_assert(ceph_mutex_is_wlocked(m_watch_lock)); if (*m_watch_handle == 0) { rewatch(); return; } CephContext *cct = reinterpret_cast<CephContext *>(m_ioctx.cct()); ldout(cct, 10) << dendl; uint64_t watch_handle = 0; std::swap(*m_watch_handle, watch_handle); librados::AioCompletion *aio_comp = create_rados_callback< RewatchRequest, &RewatchRequest::handle_unwatch>(this); int r = m_ioctx.aio_unwatch(watch_handle, aio_comp); ceph_assert(r == 0); aio_comp->release(); } </pre> Meanwhile, the request to release exclusive lock returns. The callback for ImageWatcher::notify_request_lock(), ImageWatcher::handle_request_lock() is executed, <pre> 2023-09-23T07:35:02.185+0000 7f5568565700 20 librbd::image_watcher::NotifyLockOwner: 0x55a3bada0380 handle_notify: r=0 2023-09-23T07:35:02.185+0000 7f5568565700 20 librbd::image_watcher::NotifyLockOwner: 0x55a3bada0380 handle_notify client responded with r=0 2023-09-23T07:35:02.185+0000 7f5568565700 15 librbd::ImageWatcher: 0x55a3ba806300 will retry in 10 seconds 2023-09-23T07:35:02.185+0000 7f5568565700 20 librbd::ManagedLock: 0x55a3baa34838 is_lock_owner: 0 </pre> There is no attempt to reschedule lock as the Watcher is unregistered and not yet considered blocklisted. <pre> template <typename I> void ImageWatcher<I>::schedule_request_lock(bool use_timer, int timer_delay) { [...] if (is_registered()) { ldout(m_image_ctx.cct, 15) << this << " requesting exclusive lock" << dendl; [...] } else if (is_blocklisted()) { lderr(m_image_ctx.cct) << this << " blocklisted waiting for exclusive lock" << dendl; m_image_ctx.exclusive_lock->handle_peer_notification(0); } } </pre> Only later, ReWatchRequest::handle_unwatch() is called and the Watcher is recognized as blocklisted. <pre> 2023-09-23T07:35:02.190+0000 7f5568d66700 10 librbd::watcher::RewatchRequest: 0x55a3ba2204b0 handle_unwatch r=-108 2023-09-23T07:35:02.190+0000 7f5568d66700 -1 librbd::watcher::RewatchRequest: 0x55a3ba2204b0 handle_unwatch client blocklisted 2023-09-23T07:35:02.190+0000 7f5568d66700 10 librbd::watcher::RewatchRequest: 0x55a3ba2204b0 finish r=-108 ... 2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 handle_unwatch r=-108 2023-09-23T07:35:02.193+0000 7f5568565700 -1 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 handle_unwatch client blocklisted 2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 finish r=-108 2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::Watcher: 0x55a3ba806300 handle_rewatch: r=-108 2023-09-23T07:35:02.193+0000 7f5568565700 -1 librbd::Watcher: 0x55a3ba806300 handle_rewatch: client blocklisted 2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::object_map::UnlockRequest: 0x55a3bb4950b0 handle_unlock: r=-108 2023-09-23T07:35:02.193+0000 7f5568d66700 10 librbd::Watcher: 0x55a3bb52e300 handle_rewatch_callback: r=-108 2023-09-23T07:35:02.193+0000 7f5568d66700 10 librbd::ImageWatcher: 0x55a3bb52e300 handle_rewatch_complete: r=-108 </pre> <pre> As already described in https://tracker.ceph.com/issues/61607, "ImageWatcher::handle_rewatch_complete() ignores all errors, calls into the [ExclusiveLock] state machine and even attempts to arrange for the image header to be refreshed: <pre> template <typename I> void ImageWatcher<I>::handle_rewatch_complete(int r) { CephContext *cct = m_image_ctx.cct; ldout(cct, 10) << this << " " << __func__ << ": r=" << r << dendl; { std::shared_lock owner_locker{m_image_ctx.owner_lock}; if (m_image_ctx.exclusive_lock != nullptr) { // update the lock cookie with the new watch handle m_image_ctx.exclusive_lock->reacquire_lock(nullptr); } } // image might have been updated while we didn't have active watch handle_payload(HeaderUpdatePayload(), nullptr); } </pre> The call into the state machine is recorded but gets dropped because STATE_WAITING_FOR_LOCK is a "transition" state: <pre> template <typename I> void ManagedLock<I>::reacquire_lock(Context *on_reacquired) { [...] } else if (!is_state_shutdown() && (m_state == STATE_LOCKED || m_state == STATE_ACQUIRING || m_state == STATE_POST_ACQUIRING || m_state == STATE_WAITING_FOR_LOCK)) { // interlock the lock operation with other state ops ldout(m_cct, 10) << dendl; execute_action(ACTION_REACQUIRE_LOCK, on_reacquired); return; } </pre> <pre> template <typename I> void ManagedLock<I>::execute_action(Action action, Context *ctx) { ceph_assert(ceph_mutex_is_locked(m_lock)); append_context(action, ctx); if (!is_transition_state()) { execute_next_action(); } } </pre>"