Bug #59685
open[RBD Mirror] `Mirror image status` reports "up+stopping_replay" state upon a force promote on secondary, after a disaster of primary
Added by Prasanna Kumar Kalever about 1 year ago. Updated about 1 year ago.
0%
Description
After a disaster with primary, force promote a secondary image and check the mirror image state (after sometime) the status is reported as "up+stopping_replay".
Actual results:
Image status after force promote:
state":"up+stopping_replay"
Expected results:
Image status after force promote:
state":"up+stopped"
Reproducer: 100%
Updated by Prasanna Kumar Kalever about 1 year ago
✨ sudo ../src/mstop.sh clustera ✨ ps -aux | grep rbd-mirror ✨ sudo kill -9 22786 --> rbd-mirror of site-a ✨ sudo ./bin/rbd --cluster=site-b mirror image promote pool1/img --debug-rbd=0 --force Image promoted to primary ✨ sudo ./bin/rbd --cluster=site-b mirror image status pool1/img --debug-rbd=0 img: global_id: d85f530b-ac7a-465c-ae0a-90b6fde8abaf state: up+stopping_replay description: local image linked to unknown peer service: admin on localhost.localdomain last_update: 2023-05-09 15:22:17 peer_sites: name: 0a287f34-04b6-418e-897e-c91be9e0a7e9 state: down+starting_replay description: starting replay last_update: 2023-05-09 12:19:17 snapshots: 9 .mirror.primary.d85f530b-ac7a-465c-ae0a-90b6fde8abaf.4251976f-b9e9-46bc-91c8-bb9f2f9954e4 (peer_uuids:[2505e916-4a47-4a42-950e-1c9c6d888214])
Updated by Prasanna Kumar Kalever about 1 year ago
Here is my Analysis:
template <typename I> void ImageReplayer<I>::on_stop_journal_replay(int r, const std::string &desc) { dout(10) << dendl; { ┆ std::lock_guard locker{m_lock}; ┆ if (m_state != STATE_REPLAYING) { ┆ ┆ // might be invoked multiple times while stopping ┆ ┆ return; ┆ } ┆ m_stop_requested = true; ┆ m_state = STATE_STOPPING; } cancel_update_mirror_image_replay_status(); set_state_description(r, desc); update_mirror_image_status(true, boost::none); shut_down(0); }
shut_down() is invoked above
template <typename I> void ImageReplayer<I>::shut_down(int r) { dout(10) << "r=" << r << dendl; [...] // chain the shut down sequence (reverse order) Context *ctx = new LambdaContext( ----> Doesn't get invoked ┆ [this, r](int _r) { ┆ ┆ update_mirror_image_status(true, STATE_STOPPED); ┆ ┆ handle_shut_down(r); ┆ }); // destruct the state builder if (m_state_builder != nullptr) { ┆ ctx = new LambdaContext([this, ctx](int r) { ┆ ┆ m_state_builder->close(ctx); ----> This will be eventually blocked and will not call ctx->complete(0) ┆ }); } [...] }The LambdaContext in the above shut_down() is not executed, so it won't be able to execute `update_mirror_image_status(true, STATE_STOPPED);` This is because,
m_state_builder->close(ctx); [src/tools/rbd_mirror/ImageReplayer.cc::shut_down()] --> void StateBuilder<I>::close(Context* on_finish) {} [src/tools/rbd_mirror/image_replayer/snapshot/StateBuilder.cc] --> void StateBuilder<I>::close_local_image(Context* on_finish) {} [../src/tools/rbd_mirror/image_replayer/StateBuilder.cc] --> void StateBuilder<I>::close_remote_image(Context* on_finish) {} [../src/tools/rbd_mirror/image_replayer/StateBuilder.cc] --> void CloseImageRequest<I>::send() {} [../src/tools/rbd_mirror/image_replayer/CloseImageRequest.cc] --> void CloseImageRequest<I>::close_image() {} [../src/tools/rbd_mirror/image_replayer/CloseImageRequest.cc]
template <typename I> void CloseImageRequest<I>::close_image() { dout(20) << dendl; Context *ctx = create_context_callback< ┆ CloseImageRequest<I>, &CloseImageRequest<I>::handle_close_image>(this); (*m_image_ctx)->state->close(ctx); ------> This is supposed to call void ImageState<I>::close(Context *on_finish){}, but that is not happening currently }
As the above is async calls, the execution of the process doesn't block but, close_remote_image() doesn't continue, which means the "on_finish->complete(0)" doesn't run as a result (if you trace back) the LambdaContext code hunch in the shut_down() is not invoked and update_mirror_image_status(true, STATE_STOPPED); is not called.
The remote cluster is no more available because of the disaster, this is why remote image close is not continuing.
Updated by Ilya Dryomov about 1 year ago
Hi Prasanna,
Where exactly in ImageState::close() does it hang?
Updated by Prasanna Kumar Kalever about 1 year ago
(*m_image_ctx)->state->close(ctx); ------> This is supposed to call void ImageState<I>::close(Context *on_finish){}, but that is not happening currently
I had rerun the command `sudo ./bin/rbd --cluster=site-b mirror image promote pool1/img --force --debug-rbd=0` without `--debug-rbd=0`, it produced the desired logs where I can see the above `void ImageState<I>::close(Context *on_finish){}` is continuing (running) ...
Client 1:
2023-05-09T19:53:29.677+0530 7f58fb7ed640 20 rbd::mirror::image_replayer::CloseImageRequest: 0x563cb258f7c0 handle_close_imagePRASANNAXXX: CloseImageRequest<I>::handle_close_image(): r=0 2023-05-09T19:53:29.677+0530 7f58fb7ed640 20 rbd::mirror::image_replayer::CloseImageRequest: 0x563cb258f7c0 handle_close_imagePRASANNAXXX: CloseImageRequest<I>::handle_close_image(): calling complete r=0 2023-05-09T19:53:29.677+0530 7f58fb7ed640 10 rbd::mirror::image_replayer::StateBuilder: 0x563cb23f2000 handle_close_local_image: r=0 2023-05-09T19:53:29.677+0530 7f58fb7ed640 10 rbd::mirror::image_replayer::StateBuilder: 0x563cb23f2000 close_remote_image: PRASANNAXXX: close_remote_image 2023-05-09T19:53:29.677+0530 7f58fb7ed640 10 rbd::mirror::image_replayer::StateBuilder: 0x563cb23f2000 close_remote_image: PRASANNAXXX: remote_image_ctx != nullptr 2023-05-09T19:53:29.677+0530 7f58fb7ed640 20 rbd::mirror::image_replayer::CloseImageRequest: 0x563cb216fa70 sendPRASANNAXXX: In CloseImageRequest<I>::send() 2023-05-09T19:53:29.677+0530 7f58fb7ed640 20 rbd::mirror::image_replayer::CloseImageRequest: 0x563cb216fa70 close_imagePRASANNAXXX: In CloseImageRequest<I>::close_image() 2023-05-09T19:53:29.677+0530 7f58fb7ed640 20 rbd::mirror::image_replayer::CloseImageRequest: 0x563cb216fa70 close_imagePRASANNAXXX: In CloseImageRequest<I>::close_image() called (*m_image_ctx)->state->close(ctx) 2023-05-09T19:53:29.677+0530 7f58fb7ed640 10 rbd::mirror::image_replayer::StateBuilder: 0x563cb23f2000 close_remote_image: PRASANNAXXX: done requesting image_replayer::CloseImageRequest<I>::send for remote 2023-05-09T19:53:29.677+0530 7f58fb7ed640 10 rbd::mirror::image_replayer::StateBuilder: 0x563cb23f2000 handle_close_local_image: PRASANNAXXX: handle_close_local_image called complete, r=0 2023-05-09T19:53:32.934+0530 7f590b80d640 10 rbd::mirror::MirrorStatusUpdater 0x563cad3a91e0 handle_timer_task: 2023-05-09T19:53:32.934+0530 7f590b80d640 10 rbd::mirror::MirrorStatusUpdater 0x563cad3a91e0 schedule_timer_task: 2023-05-09T19:53:32.934+0530 7f590b80d640 10 rbd::mirror::MirrorStatusUpdater 0x563cad3a91e0 queue_update_task:
Client 2:
2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2 2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2 taking Action 2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2 taking Action done 2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2 execute_action_unlock 2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 0x55a64d6c6e80 append_context PRASANNAXXX: pushing to action_contexts->second.push_back(context) 2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::ImageState: 0x55a64d6c6e80 0x55a64d6c6e80 send_close_unlock 2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::ConfigWatcher: shut_down: 2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_block_image_watcher 2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::ImageWatcher: 0x7f37ec008c00 block_notifies 2023-05-09T19:53:31.389+0530 7f3826860580 5 librbd::Watcher: 0x7f37ec008c00 block_notifies: blocked_count=1 2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_block_image_watcher: r=0 2023-05-09T19:53:31.389+0530 7f3826860580 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_shut_down_update_watchers 2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 shut_down_update_watchers 2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c7100 ImageUpdateWatchers::shut_down 2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c7100 ImageUpdateWatchers::shut_down: completing shut down 2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() 2 execute_action_unlock done 2023-05-09T19:53:31.389+0530 7f3826860580 20 librbd::ImageState: 0x55a64d6c6e80 closePRASANNAXXX: ImageState<I>::close() done 2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_shut_down_update_watchers: r=0 2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_flush 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::AsyncOperation: 0x7f37ec00a890 start_op 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::Dispatcher: 0x55a64d6c8f90 send: dispatch_spec=0x7f37ec018a70 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::QueueImageDispatch: 0x55a64d6c90e0 flush: tid=4 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::Dispatcher: 0x55a64d6c8f90 send: dispatch_spec=0x7f37ec018a70 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::QosImageDispatch: 0x55a64d6c9290 flush: tid=4 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::Dispatcher: 0x55a64d6c8f90 send: dispatch_spec=0x7f37ec018a70 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::RefreshImageDispatch: 0x55a64d6c9230 flush: tid=4 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::WriteBlockImageDispatch: 0x55a64d46d1d0 flush: tid=4 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::ImageDispatch: 0x55a64d6c90c0 flush: 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::ImageRequest: 0x7f3812ff7590 send: aio_flush: ictx=0x55a64d677090, completion=0x7f37ec00a780 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::AioCompletion: 0x7f37ec00a780 set_request_count: pending=1 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::Dispatcher: 0x55a64d6ca580 send: dispatch_spec=0x7f37ec016670 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::cache::WriteAroundObjectDispatch: 0x7f37e8005720 flush: 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::AioCompletion: 0x7f37ec00a780 complete_request: cb=1, pending=0 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::AioCompletion: 0x7f37ec00a780 finalize: r=0 2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_flush: r=0 2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_unregister_image_watcher 2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::ImageWatcher: 0x7f37ec008c00 unregistering image watcher 2023-05-09T19:53:31.389+0530 7f3812ffd640 10 librbd::Watcher: 0x7f37ec008c00 unregister_watch: 2023-05-09T19:53:31.389+0530 7f3812ffd640 20 librbd::io::AsyncOperation: 0x7f37ec00a890 finish_op 2023-05-09T19:53:31.394+0530 7f37fdffb640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_unregister_image_watcher: r=0 2023-05-09T19:53:31.394+0530 7f37fdffb640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_flush_readahead 2023-05-09T19:53:31.394+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_flush_readahead: r=0 2023-05-09T19:53:31.394+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_shut_down_image_dispatcher 2023-05-09T19:53:31.394+0530 7f3812ffd640 20 librbd::io::AsyncOperation: 0x7f37e0001f00 start_op 2023-05-09T19:53:31.394+0530 7f3812ffd640 5 librbd::io::Dispatcher: 0x55a64d6c8f90 shut_down: 2023-05-09T19:53:31.394+0530 7f3812ffd640 20 librbd::io::FlushTracker: 0x55a64d46b5a0 shut_down: 2023-05-09T19:53:31.395+0530 7f3812ffd640 20 librbd::io::FlushTracker: 0x55a64d46ba10 shut_down: 2023-05-09T19:53:31.395+0530 7f3812ffd640 20 librbd::io::AsyncOperation: 0x7f37e0001f00 finish_op 2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_shut_down_image_dispatcher: r=0 2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_shut_down_object_dispatcher 2023-05-09T19:53:31.395+0530 7f3812ffd640 5 librbd::io::Dispatcher: 0x55a64d6ca580 shut_down: 2023-05-09T19:53:31.395+0530 7f3812ffd640 5 librbd::io::ObjectDispatch: 0x55a64d6ca310 shut_down: 2023-05-09T19:53:31.395+0530 7f3812ffd640 5 librbd::cache::WriteAroundObjectDispatch: 0x7f37e8005720 shut_down: 2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_shut_down_object_dispatcher: r=0 2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 send_flush_op_work_queue 2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_flush_op_work_queue: r=0 2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::image::CloseRequest: 0x55a64d6d2dc0 handle_flush_image_watcher: r=0 2023-05-09T19:53:31.395+0530 7f3812ffd640 10 librbd::ImageState: 0x55a64d6c6e80 0x55a64d6c6e80 handle_close: r=0 2023-05-09T19:53:31.395+0530 7f37fdffb640 10 librbd::ImageCtx: 0x55a64d677090 ~ImageCtx 2023-05-09T19:53:31.395+0530 7f37fdffb640 20 librbd::AsioEngine: 0x55a64d6c4e80 ~AsioEngine: 2023-05-09T19:53:31.395+0530 7f37fdffb640 20 librbd::asio::ContextWQ: 0x55a64d6c4ec0 ~ContextWQ:
The problem is "CloseImageRequest<I>::handle_close_image(int r)" is never invoked on `client 1` for remote image.
Updated by Ilya Dryomov about 1 year ago
What do you mean by "client 1" and "client 2" and "remote image" here? Force promote is a local operation (to the cluster the image is being promoted on) -- it's supposed to operate only on the image that is being promoted.
Updated by Prasanna Kumar Kalever about 1 year ago
As commented earlier:
template <typename I> void ImageReplayer<I>::on_stop_journal_replay(int r, const std::string &desc) { dout(10) << dendl; { ┆ std::lock_guard locker{m_lock}; ┆ if (m_state != STATE_REPLAYING) { ┆ ┆ // might be invoked multiple times while stopping ┆ ┆ return; ┆ } ┆ m_stop_requested = true; ┆ m_state = STATE_STOPPING; } cancel_update_mirror_image_replay_status(); set_state_description(r, desc); update_mirror_image_status(true, boost::none); shut_down(0); }
shut_down() is invoked above
template <typename I> void ImageReplayer<I>::shut_down(int r) { dout(10) << "r=" << r << dendl; [...] // chain the shut down sequence (reverse order) Context *ctx = new LambdaContext( ----> Doesn't get invoked ┆ [this, r](int _r) { ┆ ┆ update_mirror_image_status(true, STATE_STOPPED); ┆ ┆ handle_shut_down(r); ┆ }); // destruct the state builder if (m_state_builder != nullptr) { ┆ ctx = new LambdaContext([this, ctx](int r) { ┆ ┆ m_state_builder->close(ctx); ----> This will be eventually blocked and will not call ctx->complete(0) ┆ }); } [...] }
Here is the list of calls and actions that happen which leads to miss of calling `update_mirror_image_status(true, STATE_STOPPED);` above:
-> m_state_builder->close(ctx); [src/tools/rbd_mirror/ImageReplayer.cc::shut_down()] -> void StateBuilder<I>::close(Context* on_finish) [src/tools/rbd_mirror/image_replayer/snapshot/StateBuilder.cc] -> void StateBuilder<I>::close_local_image(Context* on_finish) [src/tools/rbd_mirror/image_replayer/StateBuilder.cc] -> void CloseImageRequest<I>::close_image() [src/tools/rbd_mirror/image_replayer/CloseImageRequest.cc ] -> void ImageState<I>::close(Context *on_finish) [src/librbd/ImageState.cc] -> void ImageState<I>::execute_action_unlock(const Action &action Context *on_finish) [src/librbd/ImageState.cc] -> void ImageState<I>::append_context(const Action &action, Context *context) [src/librbd/ImageState.cc] -> void ImageState<I>::execute_next_action_unlock() [src/librbd/ImageState.cc] -> void ImageState<I>::send_close_unlock() [src/librbd/ImageState.cc] -> void CloseRequest<I>::send() [src/librbd/image/CloseRequest.cc] -> void CloseRequest<I>::send_block_image_watcher() [src/librbd/image/CloseRequest.cc] -> void CloseRequest<I>::handle_block_image_watcher(int r) [src/librbd/image/CloseRequest.cc] -> void CloseRequest<I>::send_shut_down_update_watchers() -> void CloseRequest<I>::handle_shut_down_update_watchers(int r) -> void CloseRequest<I>::send_flush() -> void CloseRequest<I>::handle_flush(int r) -> void CloseRequest<I>::send_shut_down_exclusive_lock() -> void CloseRequest<I>::send_unregister_image_watcher() -> void ImageWatcher<I>::unregister_watch(Context *on_finish) [src/librbd/ImageWatcher.cc] -> void Watcher::unregister_watch(Context *on_finish) [src/librbd/Watcher.cc] -> void CloseRequest<I>::handle_unregister_image_watcher(int r) [src/librbd/image/CloseRequest.cc] -> void CloseRequest<I>::send_flush_readahead() -> void CloseRequest<I>::handle_flush_readahead(int r) -> void CloseRequest<I>::send_shut_down_image_dispatcher() -> void CloseRequest<I>::handle_shut_down_image_dispatcher(int r) -> void CloseRequest<I>::send_shut_down_object_dispatcher() -> void CloseRequest<I>::handle_shut_down_object_dispatcher(int r) -> void CloseRequest<I>::send_flush_op_work_queue() -> void CloseRequest<I>::handle_flush_op_work_queue(int r) -> void CloseRequest<I>::send_close_parent() -> void CloseRequest<I>::handle_close_parent(int r) -> void CloseRequest<I>::send_flush_image_watcher() -> void CloseRequest<I>::handle_flush_image_watcher(int r) -> void CloseRequest<I>::finish() ->void ImageState<I>::handle_close(int r) [src/librbd/ImageState.cc] -> void CloseImageRequest<I>::handle_close_image(int r) [src/tools/rbd_mirror/image_replayer/CloseImageRequest.cc] -> void StateBuilder<I>::handle_close_local_image(int r, Context* on_finish) [src/tools/rbd_mirror/image_replayer/StateBuilder.cc] -> void StateBuilder<I>::close(Context* on_finish) [src/tools/rbd_mirror/image_replayer/snapshot/StateBuilder.cc] -> void StateBuilder<I>::close_remote_image(Context* on_finish) [src/tools/rbd_mirror/image_replayer/StateBuilder.cc] -> void CloseImageRequest<I>::close_image() [src/tools/rbd_mirror/image_replayer/CloseImageRequest.cc] -> void ImageState<I>::close(Context *on_finish) [src/librbd/ImageState.cc] -> void ImageState<I>::execute_action_unlock(const Action &action Context *on_finish) [src/librbd/ImageState.cc] -> void ImageState<I>::append_context(const Action &action, Context *context) [src/librbd/ImageState.cc] -> void ImageState<I>::execute_next_action_unlock() [src/librbd/ImageState.cc] -> void ImageState<I>::send_close_unlock() [src/librbd/ImageState.cc] -> void CloseRequest<I>::send() [src/librbd/image/CloseRequest.cc] -> void CloseRequest<I>::send_block_image_watcher() [src/librbd/image/CloseRequest.cc] -> void CloseRequest<I>::handle_block_image_watcher(int r) [src/librbd/image/CloseRequest.cc] -> void CloseRequest<I>::send_shut_down_update_watchers() -> void CloseRequest<I>::handle_shut_down_update_watchers(int r) -> void CloseRequest<I>::send_flush() -> void CloseRequest<I>::handle_flush(int r) -> void CloseRequest<I>::send_shut_down_exclusive_lock() -> void CloseRequest<I>::send_unregister_image_watcher() -> void ImageWatcher<I>::unregister_watch(Context *on_finish) [src/librbd/ImageWatcher.cc] -> void Watcher::unregister_watch(Context *on_finish) [src/librbd/Watcher.cc] -> This doesn't return
For remote image the control just stuck here in `m_ioctx.aio_unwatch` and doesn't invoke the callback:
void Watcher::unregister_watch(Context *on_finish) { ldout(m_cct, 10) << dendl; { ┆ std::unique_lock watch_locker{m_watch_lock}; ┆ if (m_watch_state != WATCH_STATE_IDLE) { ┆ ┆ ldout(m_cct, 10) << "delaying unregister until register completed" ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆<< dendl; ┆ ┆ ceph_assert(m_unregister_watch_ctx == nullptr); ┆ ┆ m_unregister_watch_ctx = new LambdaContext([this, on_finish](int r) { ┆ ┆ ┆ ┆ unregister_watch(on_finish); ┆ ┆ ┆ }); ┆ ┆ return; ┆ } else if (is_registered(m_watch_lock)) { ┆ ┆ librados::AioCompletion *aio_comp = create_rados_callback( ┆ ┆ ┆ new C_UnwatchAndFlush(m_ioctx, on_finish)); int r = m_ioctx.aio_unwatch(m_watch_handle, aio_comp); ┆ ┆ ceph_assert(r == 0); ┆ ┆ aio_comp->release(); ┆ ┆ m_watch_handle = 0; ┆ ┆ m_watch_blocklisted = false; ┆ ┆ return; ┆ } } on_finish->complete(0); }
It looks like `m_ioctx.aio_unwatch()` in `void Watcher::unregister_watch()` is stuck and doesn't callback.
Updated by Prasanna Kumar Kalever about 1 year ago
- Status changed from New to Fix Under Review
- Pull request ID set to 51540