Project

General

Profile

Actions

Bug #56516

closed

[rbd-mirror] bogus "incomplete local non-primary snapshot" replayer error

Added by Ilya Dryomov almost 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
octopus,pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

rbd-mirror daemon is preparing to start syncing the first snapshot for the just demoted local image (the remote image was appropriately promoted):

2022-07-07T08:10:56.705+0000 7ff96723f700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 scan_local_mirror_snapshots:
2022-07-07T08:10:56.705+0000 7ff96723f700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 scan_local_mirror_snapshots: local mirror snapshot: id=32870, mirror_ns=[mirror state=primary (demoted), complete=1, mirror_peer_uuids=65c09ab0-3619-40a8-ac17-6f5634b107a8, clean_since_snap_id=head]
2022-07-07T08:10:56.705+0000 7ff96723f700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 scan_local_mirror_snapshots: found local mirror snapshot: local_snap_id_start=32870, local_snap_id_end=18446744073709551614, local_snap_ns=[mirror state=primary (demoted), complete=1, mirror_peer_uuids=65c09ab0-3619-40a8-ac17-6f5634b107a8, clean_since_snap_id=head]
2022-07-07T08:10:56.705+0000 7ff96723f700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 scan_remote_mirror_snapshots:
2022-07-07T08:10:56.705+0000 7ff96723f700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 scan_remote_mirror_snapshots: remote mirror snapshot: id=23488, mirror_ns=[mirror state=primary (demoted), complete=1, mirror_peer_uuids=82b52991-23a4-488b-858f-2932d7f2fd56, clean_since_snap_id=head]
2022-07-07T08:10:56.705+0000 7ff96723f700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 scan_remote_mirror_snapshots: skipping remote snapshot 23488 while searching for demotion
2022-07-07T08:10:56.705+0000 7ff96723f700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 scan_remote_mirror_snapshots: remote mirror snapshot: id=34940, mirror_ns=[mirror state=non-primary (demoted), complete=1, mirror_peer_uuids=82b52991-23a4-488b-858f-2932d7f2fd56, primary_mirror_uuid=5cbfdfc4-e3b4-4cdb-9243-7e2ab67c770b, primary_snap_id=8066, last_copied_object_number=2, snap_seqs={32870=18446744073709551614}]
2022-07-07T08:10:56.705+0000 7ff96723f700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 scan_remote_mirror_snapshots: located matching demotion snapshot: remote_snap_id=34940, local_snap_id=32870
2022-07-07T08:10:56.705+0000 7ff96723f700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 scan_remote_mirror_snapshots: remote mirror snapshot: id=35120, mirror_ns=[mirror state=primary, complete=1, mirror_peer_uuids=82b52991-23a4-488b-858f-2932d7f2fd56, clean_since_snap_id=head]
2022-07-07T08:10:56.705+0000 7ff96723f700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 scan_remote_mirror_snapshots: found remote mirror snapshot: remote_snap_id_start=34940, remote_snap_id_end=35120, remote_snap_ns=[mirror state=primary, complete=1, mirror_peer_uuids=82b52991-23a4-488b-858f-2932d7f2fd56, clean_since_snap_id=head]
2022-07-07T08:10:56.705+0000 7ff96723f700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55cb9795b800 copy_snapshots: remote_snap_id_start=34940, remote_snap_id_end=35120, local_snap_id_start=32870

The corresponding non-primary snapshot is created:

2022-07-07T08:10:56.856+0000 7ff966a3e700 15 librbd::mirror::snapshot::CreateNonPrimaryRequest: 0x55cb97a1b520 create_snapshot: ns=[mirror state=non-primary, complete=0, mirror_peer_uuids=, primary_mirror_uuid=e85136b3-abdc-4b7f-a6bb-502778dbfefe, primary_snap_id=8930, last_copied_object_number=0, snap_seqs={35120=18446744073709551614}]
2022-07-07T08:10:56.856+0000 7ff966a3e700  5 librbd::Operations: 0x55cb98c061b0 snap_create: snap_name=.mirror.non_primary.5334c11b-2d9c-47af-9c5c-f56fc31f4407.db3dac68-ab56-4349-aec4-6a1130d9d644

However, before the sync could actually start, rbd-mirror daemon is killed. After being restarted, it is unable to pick up the pieces:

2022-07-07T08:11:45.111+0000 7f314e7d9700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x5557214d3000 scan_local_mirror_snapshots:
2022-07-07T08:11:45.111+0000 7f314e7d9700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x5557214d3000 scan_local_mirror_snapshots: local mirror snapshot: id=32870, mirror_ns=[mirror state=primary (demoted), complete=1, mirror_peer_uuids=65c09ab0-3619-40a8-ac17-6f5634b107a8, clean_since_snap_id=head]
2022-07-07T08:11:45.111+0000 7f314e7d9700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x5557214d3000 scan_local_mirror_snapshots: local mirror snapshot: id=32982, mirror_ns=[mirror state=non-primary, complete=0, mirror_peer_uuids=, primary_mirror_uuid=e85136b3-abdc-4b7f-a6bb-502778dbfefe, primary_snap_id=8930, last_copied_object_number=0, snap_seqs={35120=18446744073709551614}]
2022-07-07T08:11:45.111+0000 7f314e7d9700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x5557214d3000 scan_local_mirror_snapshots: incomplete local non-primary snapshot
2022-07-07T08:11:45.111+0000 7f314e7d9700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x5557214d3000 notify_status_updated:
2022-07-07T08:11:45.111+0000 7f3154fe6700 10 rbd::mirror::ImageReplayer: 0x5557205f8a00 [1/5334c11b-2d9c-47af-9c5c-f56fc31f4407] handle_replayer_notification:
2022-07-07T08:11:45.111+0000 7f3154fe6700 10 rbd::mirror::ImageReplayer: 0x5557205f8a00 [1/5334c11b-2d9c-47af-9c5c-f56fc31f4407] handle_replayer_notification: replay interrupted: r=-22, error=incomplete local non-primary snapshot
2022-07-07T08:11:45.111+0000 7f3154fe6700 10 rbd::mirror::ImageReplayer: 0x5557205f8a00 [1/5334c11b-2d9c-47af-9c5c-f56fc31f4407] on_stop_journal_replay:
2022-07-07T08:11:45.111+0000 7f3154fe6700 10 rbd::mirror::ImageReplayer: 0x5557205f8a00 [1/5334c11b-2d9c-47af-9c5c-f56fc31f4407] cancel_update_mirror_image_replay_status:
2022-07-07T08:11:45.111+0000 7f3154fe6700 10 rbd::mirror::ImageReplayer: 0x5557205f8a00 [1/5334c11b-2d9c-47af-9c5c-f56fc31f4407] set_state_description: r=-22, desc=incomplete local non-primary snapshot
2022-07-07T08:11:45.111+0000 7f3154fe6700 15 rbd::mirror::ImageReplayer: 0x5557205f8a00 [1/5334c11b-2d9c-47af-9c5c-f56fc31f4407] update_mirror_image_status: force=1, state=--
2022-07-07T08:11:45.111+0000 7f3154fe6700 10 rbd::mirror::ImageReplayer: 0x5557205f8a00 [1/5334c11b-2d9c-47af-9c5c-f56fc31f4407] shut_down: r=0
2022-07-07T08:11:45.111+0000 7f3154fe6700 15 rbd::mirror::ImageReplayer: 0x5557205f8a00 [1/5334c11b-2d9c-47af-9c5c-f56fc31f4407] shut_down: waiting for in-flight operations to complete
2022-07-07T08:11:45.111+0000 7f3154fe6700 15 rbd::mirror::ImageReplayer: 0x5557205f8a00 [1/5334c11b-2d9c-47af-9c5c-f56fc31f4407] set_mirror_image_status_update: force=1, state=--
2022-07-07T08:11:45.111+0000 7f3154fe6700 15 rbd::mirror::ImageReplayer: 0x5557205f8a00 [1/5334c11b-2d9c-47af-9c5c-f56fc31f4407] set_mirror_image_status_update: status={state=up+stopping_replay, description=incomplete local non-primary snapshot, last_update=0.000000]}

The completed_non_primary_snapshots_exist safety check in scan_local_mirror_snapshots() is too stringent: a possibilty of primary demotion snapshot acting as a base is not considered.


Related issues 4 (0 open4 closed)

Related to rbd - Bug #49238: [rbd-mirror] interrupted image creation might result in unlinked imagesResolvedJason Dillaman

Actions
Copied to rbd - Backport #56568: octopus: [rbd-mirror] bogus "incomplete local non-primary snapshot" replayer errorResolvedIlya DryomovActions
Copied to rbd - Backport #56569: quincy: [rbd-mirror] bogus "incomplete local non-primary snapshot" replayer errorResolvedIlya DryomovActions
Copied to rbd - Backport #56570: pacific: [rbd-mirror] bogus "incomplete local non-primary snapshot" replayer errorResolvedIlya DryomovActions
Actions #1

Updated by Ilya Dryomov almost 2 years ago

  • Subject changed from [rbd-mirror] bogus "incomplete local non-primary snapshot" error to [rbd-mirror] bogus "incomplete local non-primary snapshot" replayer error
Actions #2

Updated by Ilya Dryomov almost 2 years ago

  • Related to Bug #49238: [rbd-mirror] interrupted image creation might result in unlinked images added
Actions #3

Updated by Ilya Dryomov almost 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to octopus,pacific,quincy
  • Pull request ID set to 47035
Actions #4

Updated by Ilya Dryomov almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56568: octopus: [rbd-mirror] bogus "incomplete local non-primary snapshot" replayer error added
Actions #6

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56569: quincy: [rbd-mirror] bogus "incomplete local non-primary snapshot" replayer error added
Actions #7

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56570: pacific: [rbd-mirror] bogus "incomplete local non-primary snapshot" replayer error added
Actions #8

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #9

Updated by Backport Bot about 1 year ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF