Project

General

Profile

Actions

Bug #59685

open

[RBD Mirror] `Mirror image status` reports "up+stopping_replay" state upon a force promote on secondary, after a disaster of primary

Added by Prasanna Kumar Kalever 12 months ago. Updated 12 months ago.

Status:
Fix Under Review
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After a disaster with primary, force promote a secondary image and check the mirror image state (after sometime) the status is reported as "up+stopping_replay".

Actual results:
Image status after force promote:
state":"up+stopping_replay"

Expected results:
Image status after force promote:
state":"up+stopped"

Reproducer: 100%

Actions #1

Updated by Prasanna Kumar Kalever 12 months ago

✨ sudo ../src/mstop.sh clustera                                                                  
✨ ps -aux | grep rbd-mirror                                                                                                                                                                                             
✨ sudo kill -9 22786  --> rbd-mirror of site-a
✨ sudo ./bin/rbd --cluster=site-b mirror image promote pool1/img --debug-rbd=0 --force
Image promoted to primary

✨ sudo ./bin/rbd --cluster=site-b mirror image status pool1/img --debug-rbd=0
img:
  global_id:   d85f530b-ac7a-465c-ae0a-90b6fde8abaf
  state:       up+stopping_replay
  description: local image linked to unknown peer
  service:     admin on localhost.localdomain
  last_update: 2023-05-09 15:22:17
  peer_sites:
    name: 0a287f34-04b6-418e-897e-c91be9e0a7e9
    state: down+starting_replay
    description: starting replay
    last_update: 2023-05-09 12:19:17
  snapshots:
    9 .mirror.primary.d85f530b-ac7a-465c-ae0a-90b6fde8abaf.4251976f-b9e9-46bc-91c8-bb9f2f9954e4 (peer_uuids:[2505e916-4a47-4a42-950e-1c9c6d888214])

Actions #2

Updated by Prasanna Kumar Kalever 12 months ago

Here is my Analysis:

template <typename I>                                                        
void ImageReplayer<I>::on_stop_journal_replay(int r, const std::string &desc)
{                                                                            
  dout(10) << dendl;                                                         

  {                                                                          
  ┆ std::lock_guard locker{m_lock};                                          
  ┆ if (m_state != STATE_REPLAYING) {                                        
  ┆ ┆ // might be invoked multiple times while stopping                      
  ┆ ┆ return;                                                                
  ┆ }                                                                        

  ┆ m_stop_requested = true;                                                 
  ┆ m_state = STATE_STOPPING;                                                
  }                                                                          

  cancel_update_mirror_image_replay_status();                                
  set_state_description(r, desc);                                            
  update_mirror_image_status(true, boost::none);                             
  shut_down(0);                                                              
}                                                                            

shut_down() is invoked above

template <typename I>                    
void ImageReplayer<I>::shut_down(int r) {
  dout(10) << "r=" << r << dendl;
  [...]
  // chain the shut down sequence (reverse order)     
  Context *ctx = new LambdaContext(                   ----> Doesn't get invoked                   
  ┆ [this, r](int _r) {                               
  ┆ ┆ update_mirror_image_status(true, STATE_STOPPED);
  ┆ ┆ handle_shut_down(r);                            
  ┆ });

  // destruct the state builder                 
  if (m_state_builder != nullptr) {             
  ┆ ctx = new LambdaContext([this, ctx](int r) {
  ┆ ┆ m_state_builder->close(ctx);                   ----> This will be eventually blocked and will not call ctx->complete(0)
  ┆ });                                         
  }                                             

  [...]
 }
 
The LambdaContext in the above shut_down() is not executed, so it won't be able to execute `update_mirror_image_status(true, STATE_STOPPED);` This is because,
 
 m_state_builder->close(ctx);    [src/tools/rbd_mirror/ImageReplayer.cc::shut_down()]
   --> void StateBuilder<I>::close(Context* on_finish) {} [src/tools/rbd_mirror/image_replayer/snapshot/StateBuilder.cc]
       --> void StateBuilder<I>::close_local_image(Context* on_finish) {} [../src/tools/rbd_mirror/image_replayer/StateBuilder.cc]
           --> void StateBuilder<I>::close_remote_image(Context* on_finish) {} [../src/tools/rbd_mirror/image_replayer/StateBuilder.cc]
               --> void CloseImageRequest<I>::send() {} [../src/tools/rbd_mirror/image_replayer/CloseImageRequest.cc]
                   --> void CloseImageRequest<I>::close_image() {}  [../src/tools/rbd_mirror/image_replayer/CloseImageRequest.cc]
             
template <typename I>                                                      
void CloseImageRequest<I>::close_image() {                                 
  dout(20) << dendl;                                                       

  Context *ctx = create_context_callback<                                  
  ┆ CloseImageRequest<I>, &CloseImageRequest<I>::handle_close_image>(this);
  (*m_image_ctx)->state->close(ctx);   ------> This is supposed to call  void ImageState<I>::close(Context *on_finish){}, but that is not happening currently
}                                                                          

As the above is async calls, the execution of the process doesn't block but, close_remote_image() doesn't continue, which means the "on_finish->complete(0)" doesn't run as a result (if you trace back) the LambdaContext code hunch in the shut_down() is not invoked and update_mirror_image_status(true, STATE_STOPPED); is not called.

The remote cluster is no more available because of the disaster, this is why remote image close is not continuing.

Actions #3

Updated by Ilya Dryomov 12 months ago

Hi Prasanna,

Where exactly in ImageState::close() does it hang?

Actions #4

Updated by Prasanna Kumar Kalever 12 months ago

(*m_image_ctx)->state->close(ctx); ------> This is supposed to call void ImageState<I>::close(Context *on_finish){}, but that is not happening currently

I had rerun the command `sudo ./bin/rbd --cluster=site-b mirror image promote pool1/img --force --debug-rbd=0` without `--debug-rbd=0`, it produced the desired logs where I can see the above `void ImageState<I>::close(Context *on_finish){}` is continuing (running) ...

Client 1:

2023-05-09T19:53:29.677+0530 7f58fb7ed640 20 rbd::mirror::image_replayer::CloseImageRequest: 0x563cb258f7c0 handle_close_imagePRASANNAXXX: CloseImageRequest<I>::handle_close_image(): r=0
2023-05-09T19:53:29.677+0530 7f58fb7ed640 20 rbd::mirror::image_replayer::CloseImageRequest: 0x563cb258f7c0 handle_close_imagePRASANNAXXX: CloseImageRequest<I>::handle_close_image(): calling complete r=0
2023-05-09T19:53:29.677+0530 7f58fb7ed640 10 rbd::mirror::image_replayer::StateBuilder: 0x563cb23f2000 handle_close_local_image: r=0
2023-05-09T19:53:29.677+0530 7f58fb7ed640 10 rbd::mirror::image_replayer::StateBuilder: 0x563cb23f2000 close_remote_image: PRASANNAXXX: close_remote_image
2023-05-09T19:53:29.677+0530 7f58fb7ed640 10 rbd::mirror::image_replayer::StateBuilder: 0x563cb23f2000 close_remote_image: PRASANNAXXX: remote_image_ctx != nullptr
2023-05-09T19:53:29.677+0530 7f58fb7ed640 20 rbd::mirror::image_replayer::CloseImageRequest: 0x563cb216fa70 sendPRASANNAXXX: In CloseImageRequest<I>::send()
2023-05-09T19:53:29.677+0530 7f58fb7ed640 20 rbd::mirror::image_replayer::CloseImageRequest: 0x563cb216fa70 close_imagePRASANNAXXX: In CloseImageRequest<I>::close_image()
2023-05-09T19:53:29.677+0530 7f58fb7ed640 20 rbd::mirror::image_replayer::CloseImageRequest: 0x563cb216fa70 close_imagePRASANNAXXX: In CloseImageRequest<I>::close_image() called (*m_image_ctx)->state->close(ctx)
2023-05-09T19:53:29.677+0530 7f58fb7ed640 10 rbd::mirror::image_replayer::StateBuilder: 0x563cb23f2000 close_remote_image: PRASANNAXXX: done requesting image_replayer::CloseImageRequest<I>::send for remote
2023-05-09T19:53:29.677+0530 7f58fb7ed640 10 rbd::mirror::image_replayer::StateBuilder: 0x563cb23f2000 handle_close_local_image: PRASANNAXXX: handle_close_local_image called complete, r=0
2023-05-09T19:53:32.934+0530 7f590b80d640 10 rbd::mirror::MirrorStatusUpdater 0x563cad3a91e0 handle_timer_task:
2023-05-09T19:53:32.934+0530 7f590b80d640 10 rbd::mirror::MirrorStatusUpdater 0x563cad3a91e0 schedule_timer_task:
2023-05-09T19:53:32.934+0530 7f590b80d640 10 rbd::mirror::MirrorStatusUpdater 0x563cad3a91e0 queue_update_task:

Client 2:

2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close()
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2 taking Action
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2 taking Action done
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2 execute_action_unlock
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 0x55a64d6c6e80 append_context PRASANNAXXX: pushing to action_contexts->second.push_back(context)
2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::ImageState: 0x55a64d6c6e80 0x55a64d6c6e80 send_close_unlock
2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::ConfigWatcher: shut_down:
2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_block_image_watcher
2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::ImageWatcher: 0x7f37ec008c00 block_notifies
2023-05-09T19:53:31.389+0530 7f3826860580  5 librbd::Watcher: 0x7f37ec008c00 block_notifies: blocked_count=1
2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_block_image_watcher: r=0
2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_shut_down_update_watchers
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 shut_down_update_watchers
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c7100 ImageUpdateWatchers::shut_down
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c7100 ImageUpdateWatchers::shut_down: completing shut down
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2 execute_action_unlock done
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() done
2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_shut_down_update_watchers: r=0
2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_flush
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::AsyncOperation: 0x7f37ec00a890 start_op
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::Dispatcher: 0x55a64d6c8f90 send: dispatch_spec=0x7f37ec018a70
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::QueueImageDispatch: 0x55a64d6c90e0 flush: tid=4
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::Dispatcher: 0x55a64d6c8f90 send: dispatch_spec=0x7f37ec018a70
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::QosImageDispatch: 0x55a64d6c9290 flush: tid=4
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::Dispatcher: 0x55a64d6c8f90 send: dispatch_spec=0x7f37ec018a70
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::RefreshImageDispatch: 0x55a64d6c9230 flush: tid=4
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::WriteBlockImageDispatch: 0x55a64d46d1d0 flush: tid=4
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::ImageDispatch: 0x55a64d6c90c0 flush:
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::ImageRequest: 0x7f3812ff7590 send: aio_flush: ictx=0x55a64d677090, completion=0x7f37ec00a780
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::AioCompletion: 0x7f37ec00a780 set_request_count: pending=1
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::Dispatcher: 0x55a64d6ca580 send: dispatch_spec=0x7f37ec016670
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::cache::WriteAroundObjectDispatch: 0x7f37e8005720 flush:
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::AioCompletion: 0x7f37ec00a780 complete_request: cb=1, pending=0
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::AioCompletion: 0x7f37ec00a780 finalize: r=0
2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_flush: r=0
2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_unregister_image_watcher
2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::ImageWatcher: 0x7f37ec008c00 unregistering image watcher
2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::Watcher: 0x7f37ec008c00 unregister_watch:
2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::AsyncOperation: 0x7f37ec00a890 finish_op
2023-05-09T19:53:31.394+0530 7f37fdffb640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_unregister_image_watcher: r=0
2023-05-09T19:53:31.394+0530 7f37fdffb640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_flush_readahead
2023-05-09T19:53:31.394+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_flush_readahead: r=0
2023-05-09T19:53:31.394+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_shut_down_image_dispatcher
2023-05-09T19:53:31.394+0530 7f3812ffd640 20 librbd::io::AsyncOperation: 0x7f37e0001f00 start_op
2023-05-09T19:53:31.394+0530 7f3812ffd640  5 librbd::io::Dispatcher: 0x55a64d6c8f90 shut_down:
2023-05-09T19:53:31.394+0530 7f3812ffd640 20 librbd::io::FlushTracker: 0x55a64d46b5a0 shut_down:
2023-05-09T19:53:31.395+0530 7f3812ffd640 20 librbd::io::FlushTracker: 0x55a64d46ba10 shut_down:
2023-05-09T19:53:31.395+0530 7f3812ffd640 20 librbd::io::AsyncOperation: 0x7f37e0001f00 finish_op
2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_shut_down_image_dispatcher: r=0
2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_shut_down_object_dispatcher
2023-05-09T19:53:31.395+0530 7f3812ffd640  5 librbd::io::Dispatcher: 0x55a64d6ca580 shut_down:
2023-05-09T19:53:31.395+0530 7f3812ffd640  5 librbd::io::ObjectDispatch: 0x55a64d6ca310 shut_down:
2023-05-09T19:53:31.395+0530 7f3812ffd640  5 librbd::cache::WriteAroundObjectDispatch: 0x7f37e8005720 shut_down:
2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_shut_down_object_dispatcher: r=0
2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_flush_op_work_queue
2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_flush_op_work_queue: r=0
2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_flush_image_watcher: r=0
2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::ImageState: 0x55a64d6c6e80 0x55a64d6c6e80 handle_close: r=0
2023-05-09T19:53:31.395+0530 7f37fdffb640 10 librbd::ImageCtx: 0x55a64d677090 ~ImageCtx
2023-05-09T19:53:31.395+0530 7f37fdffb640 20 librbd::AsioEngine: 0x55a64d6c4e80 ~AsioEngine:
2023-05-09T19:53:31.395+0530 7f37fdffb640 20 librbd::asio::ContextWQ: 0x55a64d6c4ec0 ~ContextWQ:

The problem is "CloseImageRequest<I>::handle_close_image(int r)" is never invoked on `client 1` for remote image.

Actions #5

Updated by Ilya Dryomov 12 months ago

What do you mean by "client 1" and "client 2" and "remote image" here? Force promote is a local operation (to the cluster the image is being promoted on) -- it's supposed to operate only on the image that is being promoted.

Actions #6

Updated by Prasanna Kumar Kalever 12 months ago

As commented earlier:

template <typename I>                                                        
void ImageReplayer<I>::on_stop_journal_replay(int r, const std::string &desc)
{                                                                            
  dout(10) << dendl;                                                         

  {                                                                          
  ┆ std::lock_guard locker{m_lock};                                          
  ┆ if (m_state != STATE_REPLAYING) {                                        
  ┆ ┆ // might be invoked multiple times while stopping                      
  ┆ ┆ return;                                                                
  ┆ }                                                                        

  ┆ m_stop_requested = true;                                                 
  ┆ m_state = STATE_STOPPING;                                                
  }                                                                          

  cancel_update_mirror_image_replay_status();                                
  set_state_description(r, desc);                                            
  update_mirror_image_status(true, boost::none);                             
  shut_down(0);                                                              
}                                                            

shut_down() is invoked above

template <typename I>                    
void ImageReplayer<I>::shut_down(int r) {
  dout(10) << "r=" << r << dendl;
  [...]
  // chain the shut down sequence (reverse order)     
  Context *ctx = new LambdaContext(                   ----> Doesn't get invoked                   
  ┆ [this, r](int _r) {                               
  ┆ ┆ update_mirror_image_status(true, STATE_STOPPED);
  ┆ ┆ handle_shut_down(r);                            
  ┆ });

  // destruct the state builder                 
  if (m_state_builder != nullptr) {             
  ┆ ctx = new LambdaContext([this, ctx](int r) {
  ┆ ┆ m_state_builder->close(ctx);                   ----> This will be eventually blocked and will not call ctx->complete(0)
  ┆ });                                         
  }                                             

  [...]
 }
 

Here is the list of calls and actions that happen which leads to miss of calling `update_mirror_image_status(true, STATE_STOPPED);` above:

-> m_state_builder->close(ctx);    [src/tools/rbd_mirror/ImageReplayer.cc::shut_down()]
-> void StateBuilder<I>::close(Context* on_finish) [src/tools/rbd_mirror/image_replayer/snapshot/StateBuilder.cc]
 -> void StateBuilder<I>::close_local_image(Context* on_finish) [src/tools/rbd_mirror/image_replayer/StateBuilder.cc]
   -> void CloseImageRequest<I>::close_image() [src/tools/rbd_mirror/image_replayer/CloseImageRequest.cc ]
     -> void ImageState<I>::close(Context *on_finish) [src/librbd/ImageState.cc]
       -> void ImageState<I>::execute_action_unlock(const Action &action
                                                    Context *on_finish)  [src/librbd/ImageState.cc]
         -> void ImageState<I>::append_context(const Action &action, Context *context) [src/librbd/ImageState.cc]
           -> void ImageState<I>::execute_next_action_unlock() [src/librbd/ImageState.cc]
             -> void ImageState<I>::send_close_unlock() [src/librbd/ImageState.cc]
               -> void CloseRequest<I>::send() [src/librbd/image/CloseRequest.cc]
                 -> void CloseRequest<I>::send_block_image_watcher() [src/librbd/image/CloseRequest.cc]
                 -> void CloseRequest<I>::handle_block_image_watcher(int r) [src/librbd/image/CloseRequest.cc]
                 -> void CloseRequest<I>::send_shut_down_update_watchers()
                 -> void CloseRequest<I>::handle_shut_down_update_watchers(int r)
                 -> void CloseRequest<I>::send_flush()
                 -> void CloseRequest<I>::handle_flush(int r)
                 -> void CloseRequest<I>::send_shut_down_exclusive_lock()
                 -> void CloseRequest<I>::send_unregister_image_watcher()
                   -> void ImageWatcher<I>::unregister_watch(Context *on_finish) [src/librbd/ImageWatcher.cc]
                     -> void Watcher::unregister_watch(Context *on_finish) [src/librbd/Watcher.cc]
                 -> void CloseRequest<I>::handle_unregister_image_watcher(int r) [src/librbd/image/CloseRequest.cc]
                 -> void CloseRequest<I>::send_flush_readahead()
                 -> void CloseRequest<I>::handle_flush_readahead(int r)
                 -> void CloseRequest<I>::send_shut_down_image_dispatcher()
                 -> void CloseRequest<I>::handle_shut_down_image_dispatcher(int r)
                 -> void CloseRequest<I>::send_shut_down_object_dispatcher()
                 -> void CloseRequest<I>::handle_shut_down_object_dispatcher(int r)
                 -> void CloseRequest<I>::send_flush_op_work_queue()
                 -> void CloseRequest<I>::handle_flush_op_work_queue(int r)
                 -> void CloseRequest<I>::send_close_parent()
                 -> void CloseRequest<I>::handle_close_parent(int r)
                 -> void CloseRequest<I>::send_flush_image_watcher()
                 -> void CloseRequest<I>::handle_flush_image_watcher(int r)
                 -> void CloseRequest<I>::finish()
               ->void ImageState<I>::handle_close(int r) [src/librbd/ImageState.cc]
  -> void CloseImageRequest<I>::handle_close_image(int r) [src/tools/rbd_mirror/image_replayer/CloseImageRequest.cc]
-> void StateBuilder<I>::handle_close_local_image(int r, Context* on_finish) [src/tools/rbd_mirror/image_replayer/StateBuilder.cc]
  -> void StateBuilder<I>::close(Context* on_finish) [src/tools/rbd_mirror/image_replayer/snapshot/StateBuilder.cc]
    -> void StateBuilder<I>::close_remote_image(Context* on_finish) [src/tools/rbd_mirror/image_replayer/StateBuilder.cc]
      -> void CloseImageRequest<I>::close_image() [src/tools/rbd_mirror/image_replayer/CloseImageRequest.cc]
        -> void ImageState<I>::close(Context *on_finish) [src/librbd/ImageState.cc]
          -> void ImageState<I>::execute_action_unlock(const Action &action
                                                       Context *on_finish)  [src/librbd/ImageState.cc]
            -> void ImageState<I>::append_context(const Action &action, Context *context) [src/librbd/ImageState.cc]
              -> void ImageState<I>::execute_next_action_unlock() [src/librbd/ImageState.cc]
                -> void ImageState<I>::send_close_unlock() [src/librbd/ImageState.cc]
                  -> void CloseRequest<I>::send() [src/librbd/image/CloseRequest.cc]
                            -> void CloseRequest<I>::send_block_image_watcher() [src/librbd/image/CloseRequest.cc]
                            -> void CloseRequest<I>::handle_block_image_watcher(int r) [src/librbd/image/CloseRequest.cc]
                            -> void CloseRequest<I>::send_shut_down_update_watchers()
                            -> void CloseRequest<I>::handle_shut_down_update_watchers(int r)
                            -> void CloseRequest<I>::send_flush()
                            -> void CloseRequest<I>::handle_flush(int r)
                            -> void CloseRequest<I>::send_shut_down_exclusive_lock()
                            -> void CloseRequest<I>::send_unregister_image_watcher()
                              -> void ImageWatcher<I>::unregister_watch(Context *on_finish) [src/librbd/ImageWatcher.cc]
                              -> void Watcher::unregister_watch(Context *on_finish) [src/librbd/Watcher.cc] -> This doesn't return

For remote image the control just stuck here in `m_ioctx.aio_unwatch` and doesn't invoke the callback:

                              
void Watcher::unregister_watch(Context *on_finish) {
  ldout(m_cct, 10) << dendl;

  {
  ┆ std::unique_lock watch_locker{m_watch_lock};
  ┆ if (m_watch_state != WATCH_STATE_IDLE) {
  ┆ ┆ ldout(m_cct, 10) << "delaying unregister until register completed" 
  ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆<< dendl;

  ┆ ┆ ceph_assert(m_unregister_watch_ctx == nullptr);
  ┆ ┆ m_unregister_watch_ctx = new LambdaContext([this, on_finish](int r) {
  ┆ ┆ ┆ ┆ unregister_watch(on_finish);
  ┆ ┆ ┆ });
  ┆ ┆ return;
  ┆ } else if (is_registered(m_watch_lock)) {
  ┆ ┆ librados::AioCompletion *aio_comp = create_rados_callback(
  ┆ ┆ ┆ new C_UnwatchAndFlush(m_ioctx, on_finish));
      int r = m_ioctx.aio_unwatch(m_watch_handle, aio_comp);   
  ┆ ┆ ceph_assert(r == 0);
  ┆ ┆ aio_comp->release();

  ┆ ┆ m_watch_handle = 0;
  ┆ ┆ m_watch_blocklisted = false;
  ┆ ┆ return;
  ┆ }
  }

  on_finish->complete(0);
}

It looks like `m_ioctx.aio_unwatch()` in `void Watcher::unregister_watch()` is stuck and doesn't callback.

Actions #7

Updated by Prasanna Kumar Kalever 12 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 51540
Actions

Also available in: Atom PDF