Project

General

Profile

Actions

Bug #54098

closed

[rbd-mirror] hangs forever after split brain, while searching for demotion snapshot

Added by Deepika Upadhyay about 2 years ago. Updated almost 2 years ago.

Status:
Duplicate
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

this is not readily reproducible in ceph dev env, but the operations that leads to this condition:

perform IO workload for substantial time(~1hr) on primary cluster c1
while workload is running failover to secondary cluster c2; demote the image

this leads to split-brain:

2021-11-19T13:45:06.549+0000 7f1eccd83700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55dd936b6000 scan_remote_mirror_snapshots: split-brain detected:              failed to find matching non-primary snapshot in remote image: local_snap_id_start=9339, local_snap_ns=[mirror state=primary (demoted), complete=1,                            mirror_peer_uuids=c3249fb1-852f-4253-8ffa-cc4117d17a21, primary_mirror_uuid=, primary_snap_id=head, last_copied_object_number=0, snap_seqs={}]
   586581 2021-11-19T13:45:06.549+0000 7f1eccd83700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55dd936b6000 notify_status_updated:

this leads to some missing demotion snapshots that are never found:

2021-11-19T15:19:06.108+0000 7f1eccd83700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55dd95eb6400 scan_remote_mirror_snapshots: remote mirror snapshot: id=7451, mirror_ns=[mirror state=non-primary (demoted), complete=1, mirror_peer_uuids=e0e53715-2be7-46e3-8001-d8048321babc, primary_mirror_uuid=ad4d4f8b-4b43-49e3-8d83-768c763e9a4b, primary_snap_id=1359, last_copied_object_number=12800, snap_seqs={4953=18446744073709551614}]
2021-11-19T15:19:06.108+0000 7f1eccd83700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55dd95eb6400 scan_remote_mirror_snapshots: skipping remote snapshot 7451 while searching for demotion

complete logs: https://drive.google.com/file/d/1FfVT84daBKZgXJkwuWbhDUZQzEiaI8pU/view?usp=sharing


Related issues 1 (0 open1 closed)

Is duplicate of rbd - Bug #54448: [rbd-mirror] "failed to unlink local peer from remote image" due to EROFS errorResolvedIlya Dryomov

Actions
Actions

Also available in: Atom PDF