Bug #61672
closedrbd-mirror: non-primary images not deleted when the primary images are deleted
0%
Description
Racing calls to InstanceReplayer->release_image() and ImageReplayer->handle_bootstrap() in the non-primary rbd mirror daemon may prevent the non-primary image from being deleted when the primary image is deleted.
The InstanceReplayer determines that the remote image has been deleted and restarts the ImageReplayer. The restart calls bootstrap() which determines that the peer image has been deleted.
ImageReplayer::handle_bootstrap() is called with r=-ENOLINK which sets m_delete_requested to true and calls shut_down. The handle_shut_down() sees that m_delete_requested is true and schedules an image delete.
ImageReplayer::stop()
> on_stop_journal_replay() restarts
-> m_stop_requested = true; m_state = STATE_STOPPING;
-> shut_down(0)
-> handle_shut_down()
-> stop complete.
ImageReplayer::start() < -
-> bootstrap()
-> handle_bootstrap(r=-67) // #define ENOLINK 67 /* Link has been severed */
-> m_delete_requested = true
->shut_down()
->handle_shutdown()
-> if m_delete_requested == true
schedules deletion
template <typename I>
void ImageReplayer<I>::handle_bootstrap(int r) {
dout(10) << "r=" << r << dendl;
{
std::lock_guard locker{m_lock};
m_bootstrap_request->put();
m_bootstrap_request = nullptr;
}
if (on_start_interrupted()) {
return; <---------- The call returns here when the image is not deleted because m_stop_requested is true
} else if (r ENOMSG) {
dout(5) << "local image is primary" << dendl;
on_start_fail(0, "local image is primary");
return;
}
...
} else if (r -ENOLINK) {
m_delete_requested = true;
on_start_fail(0, "remote image no longer exists"); <- The call returns here when the image is deleted
return;
}
In the case where the image is not deleted, handle_bootstrap() determines that the start has been interrupted and returns without processing the -ENOLINK code path and without setting m_delete_requested to true. The image is this not moved to trash or deleted.
Not easily reproducible.