Project

General

Profile

Actions

Bug #61672

closed

rbd-mirror: non-primary images not deleted when the primary images are deleted

Added by Nithya Balachandran 11 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
pacific,quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Racing calls to InstanceReplayer->release_image() and ImageReplayer->handle_bootstrap() in the non-primary rbd mirror daemon may prevent the non-primary image from being deleted when the primary image is deleted.

The InstanceReplayer determines that the remote image has been deleted and restarts the ImageReplayer. The restart calls bootstrap() which determines that the peer image has been deleted.
ImageReplayer::handle_bootstrap() is called with r=-ENOLINK which sets m_delete_requested to true and calls shut_down. The handle_shut_down() sees that m_delete_requested is true and schedules an image delete.

ImageReplayer::stop()
> on_stop_journal_replay()
-> m_stop_requested = true; m_state = STATE_STOPPING;
-> shut_down(0)
-> handle_shut_down()
-> stop complete.
ImageReplayer::start() < -
restarts
-> bootstrap()
-> handle_bootstrap(r=-67) // #define ENOLINK 67 /* Link has been severed */
-> m_delete_requested = true
->shut_down()
->handle_shutdown()
-> if m_delete_requested == true
schedules deletion

template <typename I>
void ImageReplayer<I>::handle_bootstrap(int r) {
dout(10) << "r=" << r << dendl; {
std::lock_guard locker{m_lock};
m_bootstrap_request->put();
m_bootstrap_request = nullptr;
}

if (on_start_interrupted()) {
return; <---------- The call returns here when the image is not deleted because m_stop_requested is true
} else if (r ENOMSG) {
dout(5) << "local image is primary" << dendl;
on_start_fail(0, "local image is primary");
return;
}
...
} else if (r -ENOLINK) {
m_delete_requested = true;
on_start_fail(0, "remote image no longer exists"); <-
The call returns here when the image is deleted
return;
}

In the case where the image is not deleted, handle_bootstrap() determines that the start has been interrupted and returns without processing the -ENOLINK code path and without setting m_delete_requested to true. The image is this not moved to trash or deleted.

Not easily reproducible.


Related issues 3 (0 open3 closed)

Copied to rbd - Backport #62111: pacific: rbd-mirror: non-primary images not deleted when the primary images are deletedResolvedNithya BalachandranActions
Copied to rbd - Backport #62112: quincy: rbd-mirror: non-primary images not deleted when the primary images are deletedResolvedNithya BalachandranActions
Copied to rbd - Backport #62113: reef: rbd-mirror: non-primary images not deleted when the primary images are deletedResolvedNithya BalachandranActions
Actions

Also available in: Atom PDF