Project

General

Profile

Bug #55803

Updated by Ilya Dryomov almost 2 years ago

CreatePrimaryRequest::unlink_peer() invoked via "rbd mirror image snapshot" command or via rbd_support mgr module when creating a new scheduled mirror snapshot at rbd_mirroring_max_mirroring_snapshots capacity on the primary cluster can race with Replayer::unlink_peer() invoked by rbd-mirror when finishing syncing an older snapshot on the secondary cluster.    Consider the following: 

 <pre> 
    [ primary: primary-snap1, primary-snap2, primary-snap3 
      secondary: non-primary-snap1 (complete), non-primary-snap2 (syncing) ] 

 0. rbd-mirror is syncing snap1..snap2 delta 
 1. rbd_support creates primary-snap4 
 2. due to rbd_mirroring_max_mirroring_snapshots == 3, rbd_support picks primary-snap3 for unlinking 
 3. rbd-mirror finishes syncing snap1..snap2 delta and marks non-primary-snap2 complete 

    [ snap1 (the old base) is no longer needed on either cluster ] 

 4. rbd-mirror unlinks and removes primary-snap1 
 5. rbd-mirror removes non-primary-snap1 
 6. rbd-mirror picks snap2 as the new base 
 7. rbd-mirror creates non-primary-snap3 and starts syncing snap2..snap3 delta 

    [ primary: primary-snap2, primary-snap3, primary-snap4 
      secondary: non-primary-snap2 (complete), non-primary-snap3 (syncing) ] 

 8. rbd_support unlinks and removes primary-snap3 which is in-use by rbd-mirror 
 </pre> 

 If snap trimming on the primary cluster kicks in soon enough, the secondary image becomes corrupted: rbd-mirror would eventually finish "syncing" non-primary-snap3 and mark it complete in spite of bogus data in the HEAD -- the primary cluster would start returning ENOENT for snap trimmed objects.    Luckily, rbd-mirror's attempt to pick snap3 as the new base would wedge the replayer with "split-brain detected: failed to find matching non-primary snapshot in remote image" error: 

 <pre> 
 2022-05-31T09:05:32.317-0400 7fb191c04700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cade6f6000 scan_local_mirror_snapshots:                                                       
 2022-05-31T09:05:32.317-0400 7fb191c04700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cade6f6000 scan_local_mirror_snapshots: local mirror snapshot: id=7, mirror_ns=[mirror state=non-primary, complete=1, mirror_peer_uuids=, primary_mirror_uuid=e1260ffa-0678-4a98-a264-4d7da43b071c, primary_snap_id=7, last_copied_object_number=5120, snap_seqs={7=18446744073709551614}]                                                                                                                                                                                                                                                       
 2022-05-31T09:05:32.317-0400 7fb191c04700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cade6f6000 scan_local_mirror_snapshots: found local mirror snapshot: local_snap_id_start=7, local_snap_id_end=18446744073709551614, local_snap_ns=[mirror state=non-primary, complete=1, mirror_peer_uuids=, primary_mirror_uuid=e1260ffa-0678-4a98-a264-4d7da43b071c, primary_snap_id=7, last_copied_object_number=5120, snap_seqs={7=18446744073709551614}]                                                                                              
 2022-05-31T09:05:32.317-0400 7fb191c04700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cade6f6000 scan_remote_mirror_snapshots: 
 2022-05-31T09:05:32.317-0400 7fb191c04700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cade6f6000 scan_remote_mirror_snapshots: remote mirror snapshot: id=8, mirror_ns=[mirror state=primary, complete=1, mirror_peer_uuids=59ce4f20-5cf7-4c9e-a2d9-ad769c7c8a6d, clean_since_snap_id=head] 
 2022-05-31T09:05:32.317-0400 7fb191c04700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cade6f6000 scan_remote_mirror_snapshots: remote mirror snapshot: id=9, mirror_ns=[mirror state=primary, complete=1, mirror_peer_uuids=59ce4f20-5cf7-4c9e-a2d9-ad769c7c8a6d, clean_since_snap_id=head] 
 2022-05-31T09:05:32.317-0400 7fb191c04700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cade6f6000 scan_remote_mirror_snapshots: remote mirror snapshot: id=11, mirror_ns=[mirror state=primary, complete=1, mirror_peer_uuids=59ce4f20-5cf7-4c9e-a2d9-ad769c7c8a6d, clean_since_snap_id=head] 
 2022-05-31T09:05:32.317-0400 7fb191c04700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cade6f6000 scan_remote_mirror_snapshots: failed to locate remote start snapshot: snap_id=7                                                               
 2022-05-31T09:05:32.317-0400 7fb191c04700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cade6f6000 scan_remote_mirror_snapshots: split-brain detected: failed to find matching non-primary snapshot in remote image: local_snap_id_start=7, local_snap_ns=[mirror state=non-primary, complete=1, mirror_peer_uuids=, primary_mirror_uuid=e1260ffa-0678-4a98-a264-4d7da43b071c, primary_snap_id=7, last_copied_object_number=5120, snap_seqs={7=18446744073709551614}]   
 </pre> 

 <pre> 
 (primary) $ rbd snap ls --all img 
 SNAPID    NAME                                                                                         SIZE      PROTECTED    TIMESTAMP                   NAMESPACE                                                          
      8    .mirror.primary.97262092-b5ab-4c3f-99c2-6f9fc740ffaf.6c78cc55-d26b-4ce5-8df9-f26fdf154f17    20 GiB               Tue May 31 09:05:20 2022    mirror (primary peer_uuids:[59ce4f20-5cf7-4c9e-a2d9-ad769c7c8a6d]) 
      9    .mirror.primary.97262092-b5ab-4c3f-99c2-6f9fc740ffaf.ad039ffc-4d8a-4587-a751-99c85e1fba5c    20 GiB               Tue May 31 09:05:23 2022    mirror (primary peer_uuids:[59ce4f20-5cf7-4c9e-a2d9-ad769c7c8a6d]) 
     11    .mirror.primary.97262092-b5ab-4c3f-99c2-6f9fc740ffaf.3b6aa60c-170b-4791-bf1b-f605f2d787fd    20 GiB               Tue May 31 09:05:31 2022    mirror (primary peer_uuids:[59ce4f20-5cf7-4c9e-a2d9-ad769c7c8a6d]) 
 (secondary) $ rbd snap ls --all img 
 SNAPID    NAME                                                                                             SIZE      PROTECTED    TIMESTAMP                   NAMESPACE                                                                        
      7    .mirror.non_primary.97262092-b5ab-4c3f-99c2-6f9fc740ffaf.7b706a77-1576-4de5-a3e7-6cf992ead19f    20 GiB               Tue May 31 09:04:53 2022    mirror (non-primary peer_uuids:[] e1260ffa-0678-4a98-a264-4d7da43b071c:7 copied) 
 </pre> 

 Before commit https://github.com/ceph/ceph/commit/a888bff8d00e3e496ec80e4273e01a47b67da5dc this could happen pretty much all the time as it was the second oldest snapshot that was unlinked.    This commit changed it to be the third oldest snapshot, turning this into a more narrow but still very much possible to hit race.

Back