Project

General

Profile

Actions

Bug #18447

closed

Potential race when removing two-way mirroring image

Added by Mykola Golub over 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
kraken,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Let's we have cluster1 and cluster2 and run rbd-mirror daemons in both clusters, so cluster1 rbd-mirror replays images promoted in cluster2, and cluster2 rbd-mirror replays images promoted in cluster1.
1) Create a mirroring (promoted) image in cluster2
2) Wait for replay is started by cluster1 rbd-mirror
3) Remove the image in cluster1

Cluster1 rbd-mirror detects the remote image is removed and initiate removal of the local image. Before removing, mirroring is disabled for the image. When disabling mirroring for a demoted image, it is promoted first to avoid the journal left in an inconsistent ownership state (see #17588). At that moment (after the image is promoted but not removed yet) the cluster1 rbd-mirror has a chance to start replay for the promoted image.

The issue is some times reproduced when running qa/workunits/rbd/rbd_mirror_stress.sh: in such a case it fails in "delete images during bootstrap" test when creating an image and it fails with "already exists" error, because the previously deleted image with this name was recreated by the rbd-mirror:

+ rbd --cluster cluster2 -p mirror create --size 512M --image-feature layering,exclusive-lock,journaling image_6
2017-01-07 06:36:06.541592 7f8d9afce000 -1 librbd: rbd image image_6 already exists
rbd: create error: (17) File exists

Some log entries for cluster1 rbd-mirror image deleter triggered after the image was deleted in cluster2:

2017-01-07 06:29:04.166474 7f977d7fa700 20 rbd::mirror::ImageDeleter: 0x7f97aff96410 process_image_delete: processing deletion of snapshot image_6@snap
2017-01-07 06:29:05.820029 7f977d7fa700 10 rbd::mirror::ImageDeleter: 0x7f97aff96410 process_image_delete: snapshot image_6@snap was deleted
2017-01-07 06:29:06.030410 7f977d7fa700 20 librbd: remove 0x7f977d7f7360 101d628c895d
2017-01-07 06:29:06.170824 7f977d7fa700 10 librbd::mirror::DisableRequest: 0x7f977001f910 send_get_mirror_image
2017-01-07 06:29:06.188326 7f9761ffb700 10 librbd::mirror::DisableRequest: 0x7f977001f910 send_promote_image
2017-01-07 06:29:06.254878 7f96f1ffb700 10 librbd::mirror::DisableRequest: 0x7f977001f910 handle_promote_image: r=0
2017-01-07 06:29:07.011857 7f977d7fa700 10 rbd::mirror::ImageDeleter: 0x7f97aff96410 process_image_delete: Successfully deleted image: image_6

And at the same time cluster2 rbd-mirror bootstrap is started and succeeds:
2017-01-07 06:29:06.152189 7f47727fc700 20 rbd::mirror::Replayer: 0x7f4816a49a90 set_sources: starting image replayer for d906b6e8-0fb0-440b-aef4-4a24319cadb3
2017-01-07 06:29:06.152189 7f47727fc700 20 rbd::mirror::Replayer: 0x7f4816a49a90 start_image_replayer: global_image_id=d906b6e8-0fb0-440b-aef4-4a24319cadb3
2017-01-07 06:29:06.271231 7f4771ffb700 20 rbd::mirror::image_replayer::BootstrapRequest: 0x7f47f4200600 handle_open_remote_image: r=0
2017-01-07 06:29:06.273750 7f4771ffb700 20 rbd::mirror::image_replayer::BootstrapRequest: 0x7f47f4200600 open_local_image
2017-01-07 06:29:06.274616 7f47fa7fc700 20 rbd::mirror::image_replayer::BootstrapRequest: 0x7f47f4200600 handle_open_local_image: r=-2
2017-01-07 06:29:06.274617 7f47fa7fc700 10 rbd::mirror::image_replayer::BootstrapRequest: 0x7f47f4200600 handle_open_local_image: local image missing
2017-01-07 06:29:06.274618 7f47fa7fc700 20 rbd::mirror::image_replayer::BootstrapRequest: 0x7f47f4200600 create_local_image
2017-01-07 06:29:06.274639 7f47fa7fc700 20 librbd::image::CreateRequest: name=image_6, id=1031436c6125, size=134217728, features=69, order=22, stripe_unit=4194304, stripe_count=1, journal_order=24, journal_splay_width=4, journal_pool=, data_pool=

Note, the image promotion time 06:29:06.254878 is less than open_remote_image time 06:29:06.271231 (when the bootstrap checks if the image is promoted).


Related issues 2 (0 open2 closed)

Copied to rbd - Backport #18555: kraken: rbd: Potential race when removing two-way mirroring imageResolvedMykola GolubActions
Copied to rbd - Backport #18556: jewel: Potential race when removing two-way mirroring imageResolvedNathan CutlerActions
Actions #1

Updated by Mykola Golub over 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Mykola Golub
  • Backport set to kraken,jewel
Actions #2

Updated by Mykola Golub over 7 years ago

  • Status changed from In Progress to Fix Under Review
Actions #3

Updated by Mykola Golub over 7 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Nathan Cutler over 7 years ago

  • Copied to Backport #18555: kraken: rbd: Potential race when removing two-way mirroring image added
Actions #5

Updated by Nathan Cutler over 7 years ago

  • Copied to Backport #18556: jewel: Potential race when removing two-way mirroring image added
Actions #6

Updated by Nathan Cutler almost 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF