Bug #54539
open[rbd-mirror] cope with partially created images when bootstrapping
0%
Description
[...] decided to delete the mirror pod on c2 [...]
$ rbd mirror pool status -p storagecluster-cephblockpool health: WARNING daemon health: OK image health: WARNING images: 201 total 4 unknown 197 replaying
I think this was the root cause of "unknown" images in "rbd mirror pool status" output. I don't know what rbd-mirror was doing at the time you deleted the mirror pod, but, when I poked at the clusters, there were 4 partially created images on the secondary cluster. They were present in rbd_directory object (something that maps image names to image ids and vice versa) and two of them had their rbd_id objects but nothing else. In particular, their rbd_header objects were missing:
$ rbd info -p storagecluster-cephblockpool csi-vol-80c73fa7-a110-11ec-b7a0-0a580a890029 rbd: error opening image csi-vol-80c73fa7-a110-11ec-b7a0-0a580a890029: (2) No such file or directory$ rbd info -p storagecluster-cephblockpool csi-vol-893dd7e2-a110-11ec-b7a0-0a580a890029 2022-03-11T09:49:34.731+0000 7fcf237fe700 -1 librbd::image::OpenRequest: failed to retrieve initial metadata: (2) No such file or directory rbd: error opening image csi-vol-893dd7e2-a110-11ec-b7a0-0a580a890029: (2) No such file or directory$ rbd info -p storagecluster-cephblockpool csi-vol-8e7c8421-a110-11ec-b7a0-0a580a890029 rbd: error opening image csi-vol-8e7c8421-a110-11ec-b7a0-0a580a890029: (2) No such file or directory$ rbd info -p storagecluster-cephblockpool csi-vol-99b12fcb-a110-11ec-b7a0-0a580a890029 2022-03-11T09:50:11.958+0000 7efe39d73700 -1 librbd::image::OpenRequest: failed to retrieve initial metadata: (2) No such file or directory rbd: error opening image csi-vol-99b12fcb-a110-11ec-b7a0-0a580a890029: (2) No such file or directoryThe most obvious explanation is that rbd-mirror was in the process of creating these images when the mirror pod got axed. This lead to split-brain, as after being respawned, rbd-mirror just sat there trying to re-create the respective secondary images over these leftovers:
debug 2022-03-11T10:19:26.899+0000 7fe99c521700 -1 rbd::mirror::image_replayer::CreateImageRequest: 0x558362ca9860 handle_create_image: failed to create local image: (17) File exists debug 2022-03-11T10:19:26.899+0000 7fe99c521700 -1 rbd::mirror::image_replayer::snapshot::CreateLocalImageRequest: 0x558369541140 handle_create_local_image: failed to create local image: (17) File exists debug 2022-03-11T10:19:26.899+0000 7fe99c521700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x558359efd6c0 handle_create_local_image: failed to create local image: (17) File exists debug 2022-03-11T10:19:26.899+0000 7fe9a252d700 -1 rbd::mirror::ImageReplayer: 0x5583603e0000 [1/aeeee911-8481-45e8-8359-5557d5294c29] operator(): start failed: (17) File exists
No data to display