Bug #63009
Updated by Ramana Raja 8 months ago
Source: mgr log of http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401649/
/a/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401649/remote/smithi140/log/ceph-mgr.x.log.gz in teuthology machine
Yet another hang in librbd which occurs when exclusive lock acquisition races with blocklisting:
<pre>
2023-09-23T07:35:01.560+0000 7f5568565700 0 [rbd_support DEBUG root] CreateSnapshotRequests.create_snapshot for 2//110bc13f2b0a
...
2023-09-23T07:35:01.561+0000 7f5568d66700 15 librbd::mirror::snapshot::CreatePrimaryRequest: 0x55a3ba0e45b0 create_snapshot: name=.mirror.primary.6d36faee-3986-4977-8c10-d2f5f4af7805.0de9900f-144e-42b1-b803-e03a8750b134, ns=[mirror state=primary, complete=0, mirror_peer_uuids=6e5d5e76-e90c-4f2a-a6a0-9a5984e5deba, clean_since_snap_id=head]
2023-09-23T07:35:01.561+0000 7f5568d66700 5 librbd::Operations: 0x55a3bb8338c0 snap_create: snap_name=.mirror.primary.6d36faee-3986-4977-8c10-d2f5f4af7805.0de9900f-144e-42b1-b803-e03a8750b134
2023-09-23T07:35:01.561+0000 7f5568d66700 20 librbd::ManagedLock: 0x55a3baa34838 is_lock_owner: 0
2023-09-23T07:35:01.561+0000 7f5568d66700 20 librbd::Operations: send_acquire_exclusive_lock
2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 try_acquire_lock:
2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 send_acquire_lock:
...
2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ExclusiveLock: 0x55a3baa34820 pre_acquire_lock_handler
...
2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 handle_pre_acquire_lock: r=0
...
2023-09-23T07:35:01.591+0000 7f5568565700 10 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: r=-11
2023-09-23T07:35:01.591+0000 7f5568565700 5 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: unable to acquire exclusive lock
</pre>
After a few failed attempts to acquire exclusive lock, librbd tries to acquire exclusive lock again
<pre>
2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: r=-11
2023-09-23T07:35:02.006+0000 7f5568d66700 5 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: unable to acquire exclusive lock
2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ExclusiveLock: 0x55a3baa34820 post_acquire_lock_handler: r=-11
2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ImageState: 0x55a3bada0300 handle_prepare_lock_complete
2023-09-23T07:35:02.006+0000 7f5568d66700 20 librbd::ManagedLock: 0x55a3baa34838 is_lock_owner: 0
2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ImageWatcher: 0x55a3ba806300 notify request lock
2023-09-23T07:35:02.006+0000 7f5568d66700 20 librbd::image_watcher::NotifyLockOwner: 0x55a3bada0380 send_notify
2023-09-23T07:35:02.006+0000 7f5568d66700 20 librbd::watcher::Notifier: 0x55a3ba806380 notify: pending=1
2023-09-23T07:35:02.006+0000 7f5568d66700 1 -- 172.21.15.140:0/1974081988 --> [v2:172.21.15.140:6816/155684008,v1:172.21.15.140:6817/155684008] -- osd_op(unknown.0.0:1268 2.3 2:cb0b96e0:::rbd_header.110bc13f2b0a:head [notify cookie 94161715406848 in=39b] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e83) v8 -- 0x55a3bb206000 con 0x55a3ba3c2000
2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 handle_post_acquire_lock: r=-125
</pre>
Blocklisting is detected.
<pre>
2023-09-23T07:35:02.185+0000 7f5568565700 10 librbd::ImageWatcher: 0x55a3ba806300 current lock owner: [0,0]
2023-09-23T07:35:02.185+0000 7f5568565700 -1 librbd::Watcher: 0x55a3ba806300 handle_error: handle=94161715361792: (107) Transport endpoint is not connected
...
2023-09-23T07:35:02.185+0000 7f5568565700 10 librbd::Watcher: 0x55a3ba806300 rewatch:
2023-09-23T07:35:02.185+0000 7f5568565700 10 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 unwatch
2023-09-23T07:35:02.185+0000 7f5568565700 1 -- 172.21.15.140:0/1974081988 --> [v2:172.21.15.140:6816/155684008,v1:172.21.15.140:6817/155684008] -- osd_op(unknown.0.0:1301 2.3 2:cb0b96e0:::rbd_header.110bc13f2b0a:head [watch unwatch cookie 94161715361792] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e84) v8 -- 0x55a3bb3a8e00 con 0x55a3ba3c2000
</pre>
See ReWatchRequest::unwatch(), where watch_handle is set to zero and aio_unwatch(), asynchronous unwatch is called. Watcher is now considered unregistered but not blocklisted.
<pre>
void RewatchRequest::unwatch() {
ceph_assert(ceph_mutex_is_wlocked(m_watch_lock));
if (*m_watch_handle == 0) {
rewatch();
return;
}
CephContext *cct = reinterpret_cast<CephContext *>(m_ioctx.cct());
ldout(cct, 10) << dendl;
uint64_t watch_handle = 0;
std::swap(*m_watch_handle, watch_handle);
librados::AioCompletion *aio_comp = create_rados_callback<
RewatchRequest, &RewatchRequest::handle_unwatch>(this);
int r = m_ioctx.aio_unwatch(watch_handle, aio_comp);
ceph_assert(r == 0);
aio_comp->release();
}
</pre>
Meanwhile, the request to release exclusive lock returns. The callback for ImageWatcher::notify_request_lock(), ImageWatcher::handle_request_lock() is executed,
<pre>
2023-09-23T07:35:02.185+0000 7f5568565700 20 librbd::image_watcher::NotifyLockOwner: 0x55a3bada0380 handle_notify: r=0
2023-09-23T07:35:02.185+0000 7f5568565700 20 librbd::image_watcher::NotifyLockOwner: 0x55a3bada0380 handle_notify client responded with r=0
2023-09-23T07:35:02.185+0000 7f5568565700 15 librbd::ImageWatcher: 0x55a3ba806300 will retry in 10 seconds
2023-09-23T07:35:02.185+0000 7f5568565700 20 librbd::ManagedLock: 0x55a3baa34838 is_lock_owner: 0
</pre>
There is no attempt to reschedule lock as the Watcher is unregistered and not yet considered blocklisted.
<pre>
template <typename I>
void ImageWatcher<I>::schedule_request_lock(bool use_timer, int timer_delay) {
[...]
if (is_registered()) {
ldout(m_image_ctx.cct, 15) << this << " requesting exclusive lock" << dendl;
[...]
} else if (is_blocklisted()) {
lderr(m_image_ctx.cct) << this << " blocklisted waiting for exclusive lock"
<< dendl;
m_image_ctx.exclusive_lock->handle_peer_notification(0);
}
}
</pre>
Only later, ReWatchRequest::handle_unwatch() is called and the Watcher is recognized as blocklisted.
<pre>
2023-09-23T07:35:02.190+0000 7f5568d66700 10 librbd::watcher::RewatchRequest: 0x55a3ba2204b0 handle_unwatch r=-108
2023-09-23T07:35:02.190+0000 7f5568d66700 -1 librbd::watcher::RewatchRequest: 0x55a3ba2204b0 handle_unwatch client blocklisted
2023-09-23T07:35:02.190+0000 7f5568d66700 10 librbd::watcher::RewatchRequest: 0x55a3ba2204b0 finish r=-108
...
2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 handle_unwatch r=-108
2023-09-23T07:35:02.193+0000 7f5568565700 -1 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 handle_unwatch client blocklisted
2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 finish r=-108
2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::Watcher: 0x55a3ba806300 handle_rewatch: r=-108
2023-09-23T07:35:02.193+0000 7f5568565700 -1 librbd::Watcher: 0x55a3ba806300 handle_rewatch: client blocklisted
2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::object_map::UnlockRequest: 0x55a3bb4950b0 handle_unlock: r=-108
2023-09-23T07:35:02.193+0000 7f5568d66700 10 librbd::Watcher: 0x55a3bb52e300 handle_rewatch_callback: r=-108
2023-09-23T07:35:02.193+0000 7f5568d66700 10 librbd::ImageWatcher: 0x55a3bb52e300 handle_rewatch_complete: r=-108
</pre>
As already described in https://tracker.ceph.com/issues/61607,
"ImageWatcher::handle_rewatch_complete() ignores all errors, calls into the [ExclusiveLock] state machine and even attempts to arrange for the image header to be refreshed:
<pre>
template <typename I>
void ImageWatcher<I>::handle_rewatch_complete(int r) {
CephContext *cct = m_image_ctx.cct;
ldout(cct, 10) << this << " " << __func__ << ": r=" << r << dendl;
{
std::shared_lock owner_locker{m_image_ctx.owner_lock};
if (m_image_ctx.exclusive_lock != nullptr) {
// update the lock cookie with the new watch handle
m_image_ctx.exclusive_lock->reacquire_lock(nullptr);
}
}
// image might have been updated while we didn't have active watch
handle_payload(HeaderUpdatePayload(), nullptr);
}
</pre>
The call into the state machine is recorded but gets dropped because STATE_WAITING_FOR_LOCK is a "transition" state:
<pre>
template <typename I>
void ManagedLock<I>::reacquire_lock(Context *on_reacquired) {
[...]
} else if (!is_state_shutdown() &&
(m_state == STATE_LOCKED ||
m_state == STATE_ACQUIRING ||
m_state == STATE_POST_ACQUIRING ||
m_state == STATE_WAITING_FOR_LOCK)) {
// interlock the lock operation with other state ops
ldout(m_cct, 10) << dendl;
execute_action(ACTION_REACQUIRE_LOCK, on_reacquired);
return;
}
</pre>
<pre>
template <typename I>
void ManagedLock<I>::execute_action(Action action, Context *ctx) {
ceph_assert(ceph_mutex_is_locked(m_lock));
append_context(action, ctx);
if (!is_transition_state()) {
execute_next_action();
}
}
</pre>"