Project

General

Profile

Actions

Bug #54539

open

[rbd-mirror] cope with partially created images when bootstrapping

Added by Ilya Dryomov about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

[...] decided to delete the mirror pod on c2 [...]

$ rbd mirror pool status -p storagecluster-cephblockpool
health: WARNING
daemon health: OK
image health: WARNING
images: 201 total
    4 unknown
    197 replaying

I think this was the root cause of "unknown" images in "rbd mirror pool status" output. I don't know what rbd-mirror was doing at the time you deleted the mirror pod, but, when I poked at the clusters, there were 4 partially created images on the secondary cluster. They were present in rbd_directory object (something that maps image names to image ids and vice versa) and two of them had their rbd_id objects but nothing else. In particular, their rbd_header objects were missing:

$ rbd info -p storagecluster-cephblockpool csi-vol-80c73fa7-a110-11ec-b7a0-0a580a890029
rbd: error opening image csi-vol-80c73fa7-a110-11ec-b7a0-0a580a890029: (2) No such file or directory
$ rbd info -p storagecluster-cephblockpool csi-vol-893dd7e2-a110-11ec-b7a0-0a580a890029
2022-03-11T09:49:34.731+0000 7fcf237fe700 -1 librbd::image::OpenRequest: failed to retrieve initial metadata: (2) No such file or directory
rbd: error opening image csi-vol-893dd7e2-a110-11ec-b7a0-0a580a890029: (2) No such file or directory
$ rbd info -p storagecluster-cephblockpool csi-vol-8e7c8421-a110-11ec-b7a0-0a580a890029
rbd: error opening image csi-vol-8e7c8421-a110-11ec-b7a0-0a580a890029: (2) No such file or directory
$ rbd info -p storagecluster-cephblockpool csi-vol-99b12fcb-a110-11ec-b7a0-0a580a890029
2022-03-11T09:50:11.958+0000 7efe39d73700 -1 librbd::image::OpenRequest: failed to retrieve initial metadata: (2) No such file or directory
rbd: error opening image csi-vol-99b12fcb-a110-11ec-b7a0-0a580a890029: (2) No such file or directory

The most obvious explanation is that rbd-mirror was in the process of creating these images when the mirror pod got axed. This lead to split-brain, as after being respawned, rbd-mirror just sat there trying to re-create the respective secondary images over these leftovers:

debug 2022-03-11T10:19:26.899+0000 7fe99c521700 -1 rbd::mirror::image_replayer::CreateImageRequest: 0x558362ca9860 handle_create_image: failed to create local image: (17) File exists
debug 2022-03-11T10:19:26.899+0000 7fe99c521700 -1 rbd::mirror::image_replayer::snapshot::CreateLocalImageRequest: 0x558369541140 handle_create_local_image: failed to create local image: (17) File exists
debug 2022-03-11T10:19:26.899+0000 7fe99c521700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x558359efd6c0 handle_create_local_image: failed to create local image: (17) File exists
debug 2022-03-11T10:19:26.899+0000 7fe9a252d700 -1 rbd::mirror::ImageReplayer: 0x5583603e0000 [1/aeeee911-8481-45e8-8359-5557d5294c29] operator(): start failed: (17) File exists

No data to display

Actions

Also available in: Atom PDF