Bug #63009: another hang due to exclusive lock acquisition (STATE_WAITING_FOR_LOCK) racing with blocklisting - rbd - Ceph

Bug #63009

Yet another hang in librbd which occurs when exclusive lock acquisition races with blocklisting: 
 <pre> 
 2023-09-23T07:35:01.560+0000 7f5568565700    0 [rbd_support DEBUG root] CreateSnapshotRequests.create_snapshot for 2//110bc13f2b0a 
 ... 
 2023-09-23T07:35:01.561+0000 7f5568d66700 15 librbd::mirror::snapshot::CreatePrimaryRequest: 0x55a3ba0e45b0 create_snapshot: name=.mirror.primary.6d36faee-3986-4977-8c10-d2f5f4af7805.0de9900f-144e-42b1-b803-e03a8750b134, ns=[mirror state=primary, complete=0, mirror_peer_uuids=6e5d5e76-e90c-4f2a-a6a0-9a5984e5deba, clean_since_snap_id=head] 
 2023-09-23T07:35:01.561+0000 7f5568d66700    5 librbd::Operations: 0x55a3bb8338c0 snap_create: snap_name=.mirror.primary.6d36faee-3986-4977-8c10-d2f5f4af7805.0de9900f-144e-42b1-b803-e03a8750b134 
 2023-09-23T07:35:01.561+0000 7f5568d66700 20 librbd::ManagedLock: 0x55a3baa34838 is_lock_owner: 0 
 2023-09-23T07:35:01.561+0000 7f5568d66700 20 librbd::Operations: send_acquire_exclusive_lock 
 2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 try_acquire_lock:  
 2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 send_acquire_lock:  
 ... 
 2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ExclusiveLock: 0x55a3baa34820 pre_acquire_lock_handler 
 ... 
 2023-09-23T07:35:01.561+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 handle_pre_acquire_lock: r=0 
 ... 
 2023-09-23T07:35:01.591+0000 7f5568565700 10 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: r=-11 
 2023-09-23T07:35:01.591+0000 7f5568565700    5 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: unable to acquire exclusive lock 
 </pre> 

 After a few failed attempts to acquire exclusive lock, librbd tries to acquire exclusive lock again 
 <pre> 
 2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: r=-11 
 2023-09-23T07:35:02.006+0000 7f5568d66700    5 librbd::ManagedLock: 0x55a3baa34838 handle_acquire_lock: unable to acquire exclusive lock 
 2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ExclusiveLock: 0x55a3baa34820 post_acquire_lock_handler: r=-11 
 2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ImageState: 0x55a3bada0300 handle_prepare_lock_complete 
 2023-09-23T07:35:02.006+0000 7f5568d66700 20 librbd::ManagedLock: 0x55a3baa34838 is_lock_owner: 0 
 2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ImageWatcher: 0x55a3ba806300 notify request lock 
 2023-09-23T07:35:02.006+0000 7f5568d66700 20 librbd::image_watcher::NotifyLockOwner: 0x55a3bada0380 send_notify 
 2023-09-23T07:35:02.006+0000 7f5568d66700 20 librbd::watcher::Notifier: 0x55a3ba806380 notify: pending=1 
 2023-09-23T07:35:02.006+0000 7f5568d66700    1 -- 172.21.15.140:0/1974081988 --> [v2:172.21.15.140:6816/155684008,v1:172.21.15.140:6817/155684008] -- osd_op(unknown.0.0:1268 2.3 2:cb0b96e0:::rbd_header.110bc13f2b0a:head [notify cookie 94161715406848 in=39b] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e83) v8 -- 0x55a3bb206000 con 0x55a3ba3c2000 
 2023-09-23T07:35:02.006+0000 7f5568d66700 10 librbd::ManagedLock: 0x55a3baa34838 handle_post_acquire_lock: r=-125 
 </pre> 

 Blocklisting is detected. 
 <pre> 
 2023-09-23T07:35:02.185+0000 7f5568565700 10 librbd::ImageWatcher: 0x55a3ba806300 current lock owner: [0,0] 
 2023-09-23T07:35:02.185+0000 7f5568565700 -1 librbd::Watcher: 0x55a3ba806300 handle_error: handle=94161715361792: (107) Transport endpoint is not connected 
 ... 
 2023-09-23T07:35:02.185+0000 7f5568565700 10 librbd::Watcher: 0x55a3ba806300 rewatch:  
 2023-09-23T07:35:02.185+0000 7f5568565700 10 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 unwatch 
 2023-09-23T07:35:02.185+0000 7f5568565700    1 -- 172.21.15.140:0/1974081988 --> [v2:172.21.15.140:6816/155684008,v1:172.21.15.140:6817/155684008] -- osd_op(unknown.0.0:1301 2.3 2:cb0b96e0:::rbd_header.110bc13f2b0a:head [watch unwatch cookie 94161715361792] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e84) v8 -- 0x55a3bb3a8e00 con 0x55a3ba3c2000 
 </pre> 
  <pre> 

 See ReWatchRequest::unwatch(), where watch_handle is set to zero and aio_unwatch(), asynchronous unwatch is called. Watcher is now considered unregistered but not blocklisted. 
 <pre> 
 void RewatchRequest::unwatch() { 
   ceph_assert(ceph_mutex_is_wlocked(m_watch_lock)); 
   if (*m_watch_handle == 0) { 
     rewatch(); 
     return; 
   } 

   CephContext *cct = reinterpret_cast<CephContext *>(m_ioctx.cct()); 
   ldout(cct, 10) << dendl; 

   uint64_t watch_handle = 0; 
   std::swap(*m_watch_handle, watch_handle); 

   librados::AioCompletion *aio_comp = create_rados_callback< 
                         RewatchRequest, &RewatchRequest::handle_unwatch>(this); 
   int r = m_ioctx.aio_unwatch(watch_handle, aio_comp); 
   ceph_assert(r == 0); 
   aio_comp->release(); 
 } 
 </pre> 

 Meanwhile, the request to release exclusive lock returns. The callback for ImageWatcher::notify_request_lock(), ImageWatcher::handle_request_lock() is executed, 
 <pre> 
 2023-09-23T07:35:02.185+0000 7f5568565700 20 librbd::image_watcher::NotifyLockOwner: 0x55a3bada0380 handle_notify: r=0 
 2023-09-23T07:35:02.185+0000 7f5568565700 20 librbd::image_watcher::NotifyLockOwner: 0x55a3bada0380 handle_notify client responded with r=0 
 2023-09-23T07:35:02.185+0000 7f5568565700 15 librbd::ImageWatcher: 0x55a3ba806300 will retry in 10 seconds 
 2023-09-23T07:35:02.185+0000 7f5568565700 20 librbd::ManagedLock: 0x55a3baa34838 is_lock_owner: 0 
 </pre> 

 There is no attempt to reschedule lock as the Watcher is unregistered and not yet considered blocklisted. 
 <pre> 
 template <typename I> 
 void ImageWatcher<I>::schedule_request_lock(bool use_timer, int timer_delay) { 
   [...] 

   if (is_registered()) { 
     ldout(m_image_ctx.cct, 15) << this << " requesting exclusive lock" << dendl; 
     [...]  


   } else if (is_blocklisted()) { 
     lderr(m_image_ctx.cct) << this << " blocklisted waiting for exclusive lock" 
                            << dendl; 
     m_image_ctx.exclusive_lock->handle_peer_notification(0); 
   } 
 } 

 </pre> 

 Only later, ReWatchRequest::handle_unwatch() is called and the Watcher is recognized as blocklisted. 
 <pre> 
 2023-09-23T07:35:02.190+0000 7f5568d66700 10 librbd::watcher::RewatchRequest: 0x55a3ba2204b0 handle_unwatch r=-108 
 2023-09-23T07:35:02.190+0000 7f5568d66700 -1 librbd::watcher::RewatchRequest: 0x55a3ba2204b0 handle_unwatch client blocklisted 
 2023-09-23T07:35:02.190+0000 7f5568d66700 10 librbd::watcher::RewatchRequest: 0x55a3ba2204b0 finish r=-108 
 ... 
 2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 handle_unwatch r=-108 
 2023-09-23T07:35:02.193+0000 7f5568565700 -1 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 handle_unwatch client blocklisted 
 2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::watcher::RewatchRequest: 0x55a3ba1d1680 finish r=-108 
 2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::Watcher: 0x55a3ba806300 handle_rewatch: r=-108 
 2023-09-23T07:35:02.193+0000 7f5568565700 -1 librbd::Watcher: 0x55a3ba806300 handle_rewatch: client blocklisted 
 2023-09-23T07:35:02.193+0000 7f5568565700 10 librbd::object_map::UnlockRequest: 0x55a3bb4950b0 handle_unlock: r=-108 
 2023-09-23T07:35:02.193+0000 7f5568d66700 10 librbd::Watcher: 0x55a3bb52e300 handle_rewatch_callback: r=-108 
 2023-09-23T07:35:02.193+0000 7f5568d66700 10 librbd::ImageWatcher: 0x55a3bb52e300 handle_rewatch_complete: r=-108  
 </pre> <pre> 

 As already described in https://tracker.ceph.com/issues/61607, 
 "ImageWatcher::handle_rewatch_complete() ignores all errors, calls into the [ExclusiveLock] state machine and even attempts to arrange for the image header to be refreshed: 

 <pre> 
 template <typename I> 
 void ImageWatcher<I>::handle_rewatch_complete(int r) { 
   CephContext *cct = m_image_ctx.cct; 
   ldout(cct, 10) << this << " " << __func__ << ": r=" << r << dendl; 

   { 
     std::shared_lock owner_locker{m_image_ctx.owner_lock}; 
     if (m_image_ctx.exclusive_lock != nullptr) { 
       // update the lock cookie with the new watch handle 
       m_image_ctx.exclusive_lock->reacquire_lock(nullptr); 
     } 
   } 

   // image might have been updated while we didn't have active watch 
   handle_payload(HeaderUpdatePayload(), nullptr); 
 } 
 </pre> 

 The call into the state machine is recorded but gets dropped because STATE_WAITING_FOR_LOCK is a "transition" state: 

 <pre> 
 template <typename I> 
 void ManagedLock<I>::reacquire_lock(Context *on_reacquired) { 
 [...] 
     } else if (!is_state_shutdown() && 
                (m_state == STATE_LOCKED || 
                 m_state == STATE_ACQUIRING || 
                 m_state == STATE_POST_ACQUIRING || 
                 m_state == STATE_WAITING_FOR_LOCK)) { 
       // interlock the lock operation with other state ops 
       ldout(m_cct, 10) << dendl; 
       execute_action(ACTION_REACQUIRE_LOCK, on_reacquired); 
       return; 
     } 
 </pre> 

 <pre> 
 template <typename I> 
 void ManagedLock<I>::execute_action(Action action, Context *ctx) { 
   ceph_assert(ceph_mutex_is_locked(m_lock)); 

   append_context(action, ctx); 
   if (!is_transition_state()) { 
     execute_next_action(); 
   } 
 } 
 </pre>"

Back

Project

General

Profile

Ceph » rbd

Bug #63009