Project

General

Profile

Actions

Bug #51031

closed

rbd-mirror: metadata of mirrored image are not properly cleaned up after image deletion

Added by Arthur Outhenin-Chalandre almost 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

I have seen an issue where I have some "ghost" images when I tried to remove a RBD image from the cluster in a replication scenario.
When I delete an image from the main cluster, the image is deleted in the two cluster but I start to see the following logs from the rbd-mirror in the remote cluster:

2021-05-31T17:26:57.106+0200 7f194a535700  0 rbd::mirror::ImageReplayer: 0x55b3e1a3ab60 [13/343206b9-5618-41f5-b394-c627b4d2d920] handle_shut_down: remote image no longer exists: scheduling deletion
2021-05-31T17:27:00.762+0200 7f194a535700  0 rbd::mirror::ImageReplayer: 0x55b3e1a3ab60 [13/343206b9-5618-41f5-b394-c627b4d2d920] handle_shut_down: mirror image no longer exists
2021-05-31T17:27:00.762+0200 7f194a535700  0 rbd::mirror::ImageReplayer: 0x55b3e1a3ab60 [13/343206b9-5618-41f5-b394-c627b4d2d920] handle_shut_down: mirror image no longer exists
2021-05-31T17:27:00.763+0200 7f193ed1e700  0 rbd::mirror::ImageReplayer: 0x55b3e1a3ab60 [13/343206b9-5618-41f5-b394-c627b4d2d920] handle_shut_down: mirror image no longer exists
2021-05-31T17:27:00.763+0200 7f1937d10700  0 rbd::mirror::ImageReplayer: 0x55b3e1a3ab60 [13/343206b9-5618-41f5-b394-c627b4d2d920] handle_shut_down: mirror image no longer exists

I also can see an "unknown" image (my test image are named testX, which is clearly not the case here) in the rbd mirror status of the daemon:

            "image_replayers": [
                {
                    "name": "test2/343206b9-5618-41f5-b394-c627b4d2d920",
                    "state": "Stopped" 
                }
            ],

And then some OMAP keys (status_global_* on rbd_mirroring) remain on the main cluster while the one on the remote cluster are immediately cleaned up. After a minute or so the OMAP start to reappear in the remote cluster as well with some error in it ("error bootstrapping replay"). If I remove the OMAP key by hand and restart the rbd-mirror daemons, a OMAP key reappear on both clusters.

Steps to reproduce:
  1. Have two cluster and configure rbd mirroring between them
  2. Create a pool with mirroring enabled (with the image mode in my case, but it probably doesn't matter)
  3. Create a RBD image and enable mirroring with the journal or snapshot mode
  4. Confirm that the image is replicated on your other peer
  5. Delete the image on your first cluster
  6. Confirm the deletion on both side with rbd ls
  7. Confirm that there is a ghost image checking the rbd-mirror log, the OMAP values or the rbd-mirror daemon socket

I attached a file describing with the remote_status_global_* keys for the whole scenario presented here (with a different image from the one presented in the log posted above).


Files

omap_mirroring_bug.md (9.11 KB) omap_mirroring_bug.md Arthur Outhenin-Chalandre, 06/01/2021 08:08 AM

Related issues 2 (0 open2 closed)

Copied to rbd - Backport #53031: octopus: rbd-mirror: metadata of mirrored image are not properly cleaned up after image deletionResolvedArthur Outhenin-ChalandreActions
Copied to rbd - Backport #53032: pacific: rbd-mirror: metadata of mirrored image are not properly cleaned up after image deletionResolvedArthur Outhenin-ChalandreActions
Actions #1

Updated by Deepika Upadhyay almost 3 years ago

  • Assignee set to Deepika Upadhyay
Actions #2

Updated by Arthur Outhenin-Chalandre almost 3 years ago

Hello,

I have investigated a bit this issue lately and from what I see, the MirroringWatcher never pick up the locally removed image and then the image_map_ key is never removed as a result. I fixed this by calling ImageRemoveRequest instead of invoking directly mirror_image_remove in the following PR: https://github.com/ceph/ceph/pull/41696.

It is still marked as WIP because this only solves the cleanup of image_map_ OMAP keys but there is still some remote_status_global_ OMAP keys hanging. I will check those next week.

Actions #3

Updated by Arthur Outhenin-Chalandre over 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee changed from Deepika Upadhyay to Arthur Outhenin-Chalandre
  • Backport set to pacific, octopus
  • Pull request ID set to 41696
Actions #4

Updated by Mykola Golub over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Backport Bot over 2 years ago

  • Copied to Backport #53031: octopus: rbd-mirror: metadata of mirrored image are not properly cleaned up after image deletion added
Actions #6

Updated by Backport Bot over 2 years ago

  • Copied to Backport #53032: pacific: rbd-mirror: metadata of mirrored image are not properly cleaned up after image deletion added
Actions #7

Updated by Loïc Dachary about 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF