Project

General

Profile

Actions

Bug #55852

closed

[rbd-mirror] remote got demoted in non-primary without actually performing it

Added by Deepika Upadhyay almost 2 years ago. Updated almost 2 years ago.

Status:
Duplicate
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

- keeping image up+unknown

2022-06-03T11:31:20.003+0000 7f3bb6018700 15 rbd::mirror::ImageReplayer: 0x55b43dafe300 [2/421faf66-0be0-41e4-88a6-697d395a76e6] set_mirror_image_status_update: status={state=up+unknown, description=remote image demoted, last_update=0.000000]}

some images report demotion on both sides, when we just performed demotion on
cepheast:
  22                                                                                                                                                                                            test22:
    global_id:   a44cc01b-7ef9-4ab8-8584-9fddb6e41ce3
    state:       up+unknown
    description: remote image demoted
    service:     admin on vossi03
    last_update: 2022-06-03 15:19:48
    peer_sites:
      name: cephwest
      state: up+unknown
      description: remote image demoted
      last_update: 2022-06-03 15:19:48

status persists on restart, and hinders us to perform force promote/removal as
image has become read only.


Files

test_many.sh (3.4 KB) test_many.sh reproducer script with 65 images Deepika Upadhyay, 06/03/2022 03:56 PM

Related issues 1 (0 open1 closed)

Is duplicate of rbd - Bug #54448: [rbd-mirror] "failed to unlink local peer from remote image" due to EROFS errorResolvedIlya Dryomov

Actions
Actions #1

Updated by Deepika Upadhyay almost 2 years ago

  • Related to Bug #54448: [rbd-mirror] "failed to unlink local peer from remote image" due to EROFS error added
Actions #3

Updated by Deepika Upadhyay almost 2 years ago

logs debugging:

here cephwest states remote got demoted:

cephwest/rbd_mirror_cephwest.log:2022-06-06T08:03:23.168+0000 7f0293faa700 10 rbd::mirror::ImageReplayer: 0x5578fd661b00 [2/14c9b298-e65c-4da0-93ea-f0f977cf60fd] handle_replayer_notification: replay interrupted: r=-121, error=remo

cepheast/rbd_mirror_cepheast.log:2022-06-06T08:03:36.178+0000 7fcdcc0b9700 10 rbd::mirror::ImageReplayer: 0x556c997e8000 [2/14c9b298-e65c-4da0-93ea-f0f977cf60fd] handle_replayer_notification: replay interrupted: r=-121, error=remote image demoted

remote_demoted happens when we get
MIRROR_SNAPSHOT_STATE_PRIMARY_DEMOTED
MIRROR_SNAPSHOT_STATE_NON_PRIMARY_DEMOTED

tracking non-primary demoted snap:

cephwest/rbd_mirror_cephwest.log:2022-06-06T08:03:19.600+0000 7f0287f92700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x5578fd6eb200 create_non_primary_snapshot: demoted=1, primary_mirror_uuid=298dd4a9-5d42-473b-a439-ea2077a49845, primary_snap_id=208, snap_seqs={208=18446744073709551614}

cepheast/rbd_mirror_cepheast.log:2022-06-06T08:03:36.089+0000 7fcdc60ad700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x556c99e25680 scan_remote_mirror_snapshots: remote mirror snapshot: id=208, mirror_ns=[mirror state=non-primary (demoted), complete=1, mirror_peer_uuids=e0ab22ea-bd98-4a40-ae81-64e8dc9bf860, primary_mirror_uuid=298dd4a9-5d42-473b-a439-ea2077a49845, primary_snap_id=d0, last_copied_object_number=4,snap_seqs={208=18446744073709551614}], remote_demoted: 1

Actions #4

Updated by Ilya Dryomov almost 2 years ago

  • Project changed from RADOS to rbd
  • Status changed from New to In Progress
  • Assignee set to Deepika Upadhyay
Actions #5

Updated by Deepika Upadhyay almost 2 years ago

in cephwest why we get non-primary demoted which are later matched:


342 2022-06-06T08:03:19.598+0000 7f0287791700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x5578fd6eb200 scan_remote_mirror_snapshots: remote mirror snapshot: id=208, mirror_ns=[mirror state=primary (demoted), complete=1, mirror_peer_uuids=325521f4-c64b-42bc-bc99-61c40784fc48, clean_since_snap_id=head], remote_demoted: 

2022-06-06T08:03:20.511+0000 7f0287f92700 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x5578fd6eb200 scan_local_mirror_snapshots: local mirror snapshot: id=208, mirror_ns=[mirror state=non-primary (demoted), complete=1, mirror_peer_uuids=e0ab22ea-bd98-4a40-ae81-64e8dc9bf860, primary_mirror_uuid=298dd4a9-5d42-473b-a439-ea2077a49845, primary_snap_id=d0, last_copied_object_number=4, snap_seqs={208=18446744073709551614}]

compared to a passing case, this is fine actually(can improve reporting here), so we need to look into force promote failure

looking at listing snap:

18

2022-06-08T14:25:45.147+0000 7f6e264da380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:25:45.148+0000 7f6e264da380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:25:45.151+0000 7f6e264da380 -1 WARNING: all dangerous and experimental features are enabled.
test18:
  global_id:   b68c4eb4-d325-4891-8dcb-ac9783cef36c
  state:       up+unknown
  description: remote image demoted
  service:     admin on vossi03
  last_update: 2022-06-08 14:25:27
  peer_sites:
    name: cephwest
    state: up+unknown
    description: remote image demoted
    last_update: 2022-06-08 14:25:27

19

2022-06-08T14:25:45.252+0000 7f9e91b68380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:25:45.252+0000 7f9e91b68380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:25:45.256+0000 7f9e91b68380 -1 WARNING: all dangerous and experimental features are enabled.
test19:
  global_id:   a8ef6c3c-fb17-44b5-984a-99be6996b636
  state:       up+stopped
  description: local image is primary
  service:     admin on vossi03
  last_update: 2022-06-08 14:25:27
  peer_sites:
    name: cephwest
    state: up+replaying
    description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1654698302,"remote_snapshot_timestamp":1654698302,"replay_state":"idle"}
    last_update: 2022-06-08 14:25:27
  snapshots:
    850 .mirror.primary.a8ef6c3c-fb17-44b5-984a-99be6996b636.cfee9577-9c81-4e7d-a895-606d10ff9c08 (peer_uuids:[34dc7477-b497-4b59-9795-537843ffb3ef])

don't have even one snap for failing case aka 18th test image

same goes for other instances eg 46th image:

46

2022-06-08T14:25:48.117+0000 7f550b6c7380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:25:48.117+0000 7f550b6c7380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:25:48.121+0000 7f550b6c7380 -1 WARNING: all dangerous and experimental features are enabled.
test46:
  global_id:   dc0690b2-cad4-428b-946f-3a7076604a8d
  state:       up+unknown
  description: remote image demoted
  service:     admin on vossi03
  last_update: 2022-06-08 14:25:27
  peer_sites:
    name: cephwest
    state: up+unknown
    description: remote image demoted
    last_update: 2022-06-08 14:25:28
47

2022-06-08T14:25:48.222+0000 7f439c8c5380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:25:48.222+0000 7f439c8c5380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:25:48.226+0000 7f439c8c5380 -1 WARNING: all dangerous and experimental features are enabled.
test47:
  global_id:   973ec360-8082-49ef-9c83-c375d4f71473
  state:       up+stopped
  description: local image is primary
  service:     admin on vossi03
  last_update: 2022-06-08 14:25:27
  peer_sites:
    name: cephwest
    state: up+replaying
    description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1654698305,"remote_snapshot_timestamp":1654698305,"replay_state":"idle"}
    last_update: 2022-06-08 14:25:27
  snapshots:
  890 .mirror.primary.973ec360-8082-49ef-9c83-c375d4f71473.ec421158-1098-4a14-8ead-cf57703e7753 (peer_uuids:[34dc7477-b497-4b59-9795-537843ffb3ef])


failed promotions:
listed above:
18

2022-06-08T14:22:31.763+0000 7f323f628380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:22:31.763+0000 7f323f628380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:22:31.767+0000 7f323f628380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:22:31.791+0000 7f32315dd700 -1 librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f31f4001540 handle_create_snapshot: failed to create mirror snapshot: (30) Read-only file system
2022-06-08T14:22:31.792+0000 7f32315dd700 -1 librbd::mirror::snapshot::PromoteRequest: 0x7f31f40014a0 handle_create_promote_snapshot: failed to create promote snapshot: (30) Read-only file system
2022-06-08T14:22:31.792+0000 7f32315dd700 -1 librbd::mirror::PromoteRequest: 0x55614e298a90 handle_promote: failed to promote image: (30) Read-only file system
2022-06-08T14:22:31.792+0000 7f323f628380 -1 librbd::api::Mirror: image_promote: failed to promote image
rbd: error promoting image to primary
19
46

2022-06-08T14:22:58.531+0000 7f62661e7380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:22:58.532+0000 7f62661e7380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:22:58.535+0000 7f62661e7380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:22:58.652+0000 7f624bfff700 -1 librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f6220001580 handle_create_snapshot: failed to create mirror snapshot: (30) Read-only file system
2022-06-08T14:22:58.652+0000 7f624bfff700 -1 librbd::mirror::snapshot::PromoteRequest: 0x7f62200014e0 handle_create_promote_snapshot: failed to create promote snapshot: (30) Read-only file system
2022-06-08T14:22:58.652+0000 7f624bfff700 -1 librbd::mirror::PromoteRequest: 0x55607c08a900 handle_promote: failed to promote image: (30) Read-only file system
2022-06-08T14:22:58.652+0000 7f62661e7380 -1 librbd::api::Mirror: image_promote: failed to promote image
rbd: error promoting image to primary
47

2022-06-08T14:22:58.736+0000 7f737297d380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:22:58.737+0000 7f737297d380 -1 WARNING: all dangerous and experimental features are enabled.
2022-06-08T14:22:58.740+0000 7f737297d380 -1 WARNING: all dangerous and experimental features are enabled.
Image promoted to primary

so it's actually an overload of the snap creation process somehow blocking snap creation for some images when we are working with large images.

Actions #6

Updated by Deepika Upadhyay almost 2 years ago

  • Status changed from In Progress to Duplicate
Actions #7

Updated by Ilya Dryomov almost 2 years ago

  • Related to deleted (Bug #54448: [rbd-mirror] "failed to unlink local peer from remote image" due to EROFS error)
Actions #8

Updated by Ilya Dryomov almost 2 years ago

  • Is duplicate of Bug #54448: [rbd-mirror] "failed to unlink local peer from remote image" due to EROFS error added
Actions

Also available in: Atom PDF