Actions
Bug #55852
closed[rbd-mirror] remote got demoted in non-primary without actually performing it
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
- keeping image up+unknown 2022-06-03T11:31:20.003+0000 7f3bb6018700 15 rbd::mirror::ImageReplayer: 0x55b43dafe300 [2/421faf66-0be0-41e4-88a6-697d395a76e6] set_mirror_image_status_update: status={state=up+unknown, description=remote image demoted, last_update=0.000000]} some images report demotion on both sides, when we just performed demotion on cepheast: 22 test22: global_id: a44cc01b-7ef9-4ab8-8584-9fddb6e41ce3 state: up+unknown description: remote image demoted service: admin on vossi03 last_update: 2022-06-03 15:19:48 peer_sites: name: cephwest state: up+unknown description: remote image demoted last_update: 2022-06-03 15:19:48 status persists on restart, and hinders us to perform force promote/removal as image has become read only.
Files
Updated by Deepika Upadhyay almost 2 years ago
- Related to Bug #54448: [rbd-mirror] "failed to unlink local peer from remote image" due to EROFS error added
Updated by Deepika Upadhyay almost 2 years ago
- File test_many.sh test_many.sh added
Updated by Deepika Upadhyay almost 2 years ago
logs debugging:
here cephwest states remote got demoted:
cephwest/rbd_mirror_cephwest.log:2022-06-06T08:03:23.168+0000 7f0293faa700 10 rbd::mirror::ImageReplayer: 0x5578fd661b00 [2/14c9b298-e65c-4da0-93ea-f0f977cf60fd] handle_replayer_notification: replay interrupted: r=-121, error=remo
cepheast/rbd_mirror_cepheast.log:2022-06-06T08:03:36.178+0000 7fcdcc0b9700 10 rbd::mirror::ImageReplayer: 0x556c997e8000 [2/14c9b298-e65c-4da0-93ea-f0f977cf60fd] handle_replayer_notification: replay interrupted: r=-121, error=remote image demoted
remote_demoted happens when we get
MIRROR_SNAPSHOT_STATE_PRIMARY_DEMOTED
MIRROR_SNAPSHOT_STATE_NON_PRIMARY_DEMOTED
tracking non-primary demoted snap:
cephwest/rbd_mirror_cephwest.log:2022-06-06T08:03:19.600+0000 7f0287f92700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x5578fd6eb200 create_non_primary_snapshot: demoted=1, primary_mirror_uuid=298dd4a9-5d42-473b-a439-ea2077a49845, primary_snap_id=208, snap_seqs={208=18446744073709551614}
cepheast/rbd_mirror_cepheast.log:2022-06-06T08:03:36.089+0000 7fcdc60ad700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x556c99e25680 scan_remote_mirror_snapshots: remote mirror snapshot: id=208, mirror_ns=[mirror state=non-primary (demoted), complete=1, mirror_peer_uuids=e0ab22ea-bd98-4a40-ae81-64e8dc9bf860, primary_mirror_uuid=298dd4a9-5d42-473b-a439-ea2077a49845, primary_snap_id=d0, last_copied_object_number=4,snap_seqs={208=18446744073709551614}], remote_demoted: 1
Updated by Ilya Dryomov almost 2 years ago
- Project changed from RADOS to rbd
- Status changed from New to In Progress
- Assignee set to Deepika Upadhyay
Updated by Deepika Upadhyay almost 2 years ago
in cephwest why we get non-primary demoted which are later matched:
342 2022-06-06T08:03:19.598+0000 7f0287791700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x5578fd6eb200 scan_remote_mirror_snapshots: remote mirror snapshot: id=208, mirror_ns=[mirror state=primary (demoted), complete=1, mirror_peer_uuids=325521f4-c64b-42bc-bc99-61c40784fc48, clean_since_snap_id=head], remote_demoted: 2022-06-06T08:03:20.511+0000 7f0287f92700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x5578fd6eb200 scan_local_mirror_snapshots: local mirror snapshot: id=208, mirror_ns=[mirror state=non-primary (demoted), complete=1, mirror_peer_uuids=e0ab22ea-bd98-4a40-ae81-64e8dc9bf860, primary_mirror_uuid=298dd4a9-5d42-473b-a439-ea2077a49845, primary_snap_id=d0, last_copied_object_number=4, snap_seqs={208=18446744073709551614}]
compared to a passing case, this is fine actually(can improve reporting here), so we need to look into force promote failure
looking at listing snap:
18 2022-06-08T14:25:45.147+0000 7f6e264da380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:25:45.148+0000 7f6e264da380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:25:45.151+0000 7f6e264da380 -1 WARNING: all dangerous and experimental features are enabled. test18: global_id: b68c4eb4-d325-4891-8dcb-ac9783cef36c state: up+unknown description: remote image demoted service: admin on vossi03 last_update: 2022-06-08 14:25:27 peer_sites: name: cephwest state: up+unknown description: remote image demoted last_update: 2022-06-08 14:25:27 19 2022-06-08T14:25:45.252+0000 7f9e91b68380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:25:45.252+0000 7f9e91b68380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:25:45.256+0000 7f9e91b68380 -1 WARNING: all dangerous and experimental features are enabled. test19: global_id: a8ef6c3c-fb17-44b5-984a-99be6996b636 state: up+stopped description: local image is primary service: admin on vossi03 last_update: 2022-06-08 14:25:27 peer_sites: name: cephwest state: up+replaying description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1654698302,"remote_snapshot_timestamp":1654698302,"replay_state":"idle"} last_update: 2022-06-08 14:25:27 snapshots: 850 .mirror.primary.a8ef6c3c-fb17-44b5-984a-99be6996b636.cfee9577-9c81-4e7d-a895-606d10ff9c08 (peer_uuids:[34dc7477-b497-4b59-9795-537843ffb3ef])
don't have even one snap for failing case aka 18th test image
same goes for other instances eg 46th image:
46 2022-06-08T14:25:48.117+0000 7f550b6c7380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:25:48.117+0000 7f550b6c7380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:25:48.121+0000 7f550b6c7380 -1 WARNING: all dangerous and experimental features are enabled. test46: global_id: dc0690b2-cad4-428b-946f-3a7076604a8d state: up+unknown description: remote image demoted service: admin on vossi03 last_update: 2022-06-08 14:25:27 peer_sites: name: cephwest state: up+unknown description: remote image demoted last_update: 2022-06-08 14:25:28 47 2022-06-08T14:25:48.222+0000 7f439c8c5380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:25:48.222+0000 7f439c8c5380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:25:48.226+0000 7f439c8c5380 -1 WARNING: all dangerous and experimental features are enabled. test47: global_id: 973ec360-8082-49ef-9c83-c375d4f71473 state: up+stopped description: local image is primary service: admin on vossi03 last_update: 2022-06-08 14:25:27 peer_sites: name: cephwest state: up+replaying description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1654698305,"remote_snapshot_timestamp":1654698305,"replay_state":"idle"} last_update: 2022-06-08 14:25:27 snapshots: 890 .mirror.primary.973ec360-8082-49ef-9c83-c375d4f71473.ec421158-1098-4a14-8ead-cf57703e7753 (peer_uuids:[34dc7477-b497-4b59-9795-537843ffb3ef])
failed promotions:
listed above:
18 2022-06-08T14:22:31.763+0000 7f323f628380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:22:31.763+0000 7f323f628380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:22:31.767+0000 7f323f628380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:22:31.791+0000 7f32315dd700 -1 librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f31f4001540 handle_create_snapshot: failed to create mirror snapshot: (30) Read-only file system 2022-06-08T14:22:31.792+0000 7f32315dd700 -1 librbd::mirror::snapshot::PromoteRequest: 0x7f31f40014a0 handle_create_promote_snapshot: failed to create promote snapshot: (30) Read-only file system 2022-06-08T14:22:31.792+0000 7f32315dd700 -1 librbd::mirror::PromoteRequest: 0x55614e298a90 handle_promote: failed to promote image: (30) Read-only file system 2022-06-08T14:22:31.792+0000 7f323f628380 -1 librbd::api::Mirror: image_promote: failed to promote image rbd: error promoting image to primary 19 46 2022-06-08T14:22:58.531+0000 7f62661e7380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:22:58.532+0000 7f62661e7380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:22:58.535+0000 7f62661e7380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:22:58.652+0000 7f624bfff700 -1 librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f6220001580 handle_create_snapshot: failed to create mirror snapshot: (30) Read-only file system 2022-06-08T14:22:58.652+0000 7f624bfff700 -1 librbd::mirror::snapshot::PromoteRequest: 0x7f62200014e0 handle_create_promote_snapshot: failed to create promote snapshot: (30) Read-only file system 2022-06-08T14:22:58.652+0000 7f624bfff700 -1 librbd::mirror::PromoteRequest: 0x55607c08a900 handle_promote: failed to promote image: (30) Read-only file system 2022-06-08T14:22:58.652+0000 7f62661e7380 -1 librbd::api::Mirror: image_promote: failed to promote image rbd: error promoting image to primary 47 2022-06-08T14:22:58.736+0000 7f737297d380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:22:58.737+0000 7f737297d380 -1 WARNING: all dangerous and experimental features are enabled. 2022-06-08T14:22:58.740+0000 7f737297d380 -1 WARNING: all dangerous and experimental features are enabled. Image promoted to primary
so it's actually an overload of the snap creation process somehow blocking snap creation for some images when we are working with large images.
Updated by Deepika Upadhyay almost 2 years ago
- Status changed from In Progress to Duplicate
Updated by Ilya Dryomov almost 2 years ago
- Related to deleted (Bug #54448: [rbd-mirror] "failed to unlink local peer from remote image" due to EROFS error)
Updated by Ilya Dryomov almost 2 years ago
- Is duplicate of Bug #54448: [rbd-mirror] "failed to unlink local peer from remote image" due to EROFS error added
Actions