Bug #51031: rbd-mirror: metadata of mirrored image are not properly cleaned up after image deletion - rbd - Ceph

Actions

Copy link

Bug #51031

closed

rbd-mirror: metadata of mirrored image are not properly cleaned up after image deletion

Added by Arthur Outhenin-Chalandre almost 3 years ago. Updated about 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Arthur Outhenin-Chalandre

Target version:

% Done:

Source:

Tags:

Backport:

pacific, octopus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.13

ceph-qa-suite:

Pull request ID:

41696

Crash signature (v1):

Crash signature (v2):

Description

Hello,

I have seen an issue where I have some "ghost" images when I tried to remove a RBD image from the cluster in a replication scenario.
When I delete an image from the main cluster, the image is deleted in the two cluster but I start to see the following logs from the rbd-mirror in the remote cluster:

2021-05-31T17:26:57.106+0200 7f194a535700  0 rbd::mirror::ImageReplayer: 0x55b3e1a3ab60 [13/343206b9-5618-41f5-b394-c627b4d2d920] handle_shut_down: remote image no longer exists: scheduling deletion
2021-05-31T17:27:00.762+0200 7f194a535700  0 rbd::mirror::ImageReplayer: 0x55b3e1a3ab60 [13/343206b9-5618-41f5-b394-c627b4d2d920] handle_shut_down: mirror image no longer exists
2021-05-31T17:27:00.762+0200 7f194a535700  0 rbd::mirror::ImageReplayer: 0x55b3e1a3ab60 [13/343206b9-5618-41f5-b394-c627b4d2d920] handle_shut_down: mirror image no longer exists
2021-05-31T17:27:00.763+0200 7f193ed1e700  0 rbd::mirror::ImageReplayer: 0x55b3e1a3ab60 [13/343206b9-5618-41f5-b394-c627b4d2d920] handle_shut_down: mirror image no longer exists
2021-05-31T17:27:00.763+0200 7f1937d10700  0 rbd::mirror::ImageReplayer: 0x55b3e1a3ab60 [13/343206b9-5618-41f5-b394-c627b4d2d920] handle_shut_down: mirror image no longer exists

I also can see an "unknown" image (my test image are named testX, which is clearly not the case here) in the rbd mirror status of the daemon:

            "image_replayers": [
                {
                    "name": "test2/343206b9-5618-41f5-b394-c627b4d2d920",
                    "state": "Stopped" 
                }
            ],

And then some OMAP keys (status_global_* on rbd_mirroring) remain on the main cluster while the one on the remote cluster are immediately cleaned up. After a minute or so the OMAP start to reappear in the remote cluster as well with some error in it ("error bootstrapping replay"). If I remove the OMAP key by hand and restart the rbd-mirror daemons, a OMAP key reappear on both clusters.

Steps to reproduce:

Have two cluster and configure rbd mirroring between them
Create a pool with mirroring enabled (with the image mode in my case, but it probably doesn't matter)
Create a RBD image and enable mirroring with the journal or snapshot mode
Confirm that the image is replicated on your other peer
Delete the image on your first cluster
Confirm the deletion on both side with rbd ls
Confirm that there is a ghost image checking the rbd-mirror log, the OMAP values or the rbd-mirror daemon socket

I attached a file describing with the remote_status_global_* keys for the whole scenario presented here (with a different image from the one presented in the log posted above).

Files

omap_mirroring_bug.md (9.11 KB) omap_mirroring_bug.md

Arthur Outhenin-Chalandre, 06/01/2021 08:08 AM

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Deepika Upadhyay almost 3 years ago

Assignee set to Deepika Upadhyay

Actions

Copy link

Updated by Arthur Outhenin-Chalandre almost 3 years ago

Hello,

I have investigated a bit this issue lately and from what I see, the MirroringWatcher never pick up the locally removed image and then the image_map_ key is never removed as a result. I fixed this by calling ImageRemoveRequest instead of invoking directly mirror_image_remove in the following PR: https://github.com/ceph/ceph/pull/41696.

It is still marked as WIP because this only solves the cleanup of image_map_ OMAP keys but there is still some remote_status_global_ OMAP keys hanging. I will check those next week.

Actions

Copy link

Updated by Arthur Outhenin-Chalandre over 2 years ago

Status changed from New to Fix Under Review
Assignee changed from Deepika Upadhyay to Arthur Outhenin-Chalandre
Backport set to pacific, octopus
Pull request ID set to 41696

Actions

Copy link

Updated by Mykola Golub over 2 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Backport Bot over 2 years ago

Copied to Backport #53031: octopus: rbd-mirror: metadata of mirrored image are not properly cleaned up after image deletion added

Actions

Copy link

Updated by Backport Bot over 2 years ago

Copied to Backport #53032: pacific: rbd-mirror: metadata of mirrored image are not properly cleaned up after image deletion added

Actions

Copy link

Updated by Loïc Dachary about 2 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #51031

rbd-mirror: metadata of mirrored image are not properly cleaned up after image deletion

Updated by Deepika Upadhyay almost 3 years ago

Updated by Arthur Outhenin-Chalandre almost 3 years ago

Updated by Arthur Outhenin-Chalandre over 2 years ago

Updated by Mykola Golub over 2 years ago

Updated by Backport Bot over 2 years ago

Updated by Backport Bot over 2 years ago

Updated by Loïc Dachary about 2 years ago