Project

General

Profile

Bug #36659

[rbd-mirror] forced promotion after killing remote cluster results in stuck state

Added by Jason Dillaman over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The rbd-mirror daemon detects that the image has been locally promoted and attempts to shut down, but it hangs since the remote cluster is unresponsive and skips the status update.

2018-10-31 10:45:32.341 7f32f8ff9700 20 rbd::mirror::ImageReplayer: 0x7f333800e4e0 [1/41ddd4e2-5716-4c14-9568-c1340762addd] on_stop_journal_replay: enter
2018-10-31 10:45:32.341 7f32f8ff9700 20 rbd::mirror::ImageReplayer: 0x7f333800e4e0 [1/41ddd4e2-5716-4c14-9568-c1340762addd] set_state_description: 0 force promoted
2018-10-31 10:45:32.341 7f32f8ff9700 20 rbd::mirror::ImageReplayer: 0x7f333800e4e0 [1/41ddd4e2-5716-4c14-9568-c1340762addd] update_mirror_image_status:
2018-10-31 10:45:32.341 7f32f8ff9700 20 rbd::mirror::ImageReplayer: 0x7f333800e4e0 [1/41ddd4e2-5716-4c14-9568-c1340762addd] start_mirror_image_status_update: shut down in-progress: ignoring update
2018-10-31 10:45:32.341 7f32f8ff9700 15 rbd::mirror::ImageReplayer: 0x7f333800e4e0 [1/41ddd4e2-5716-4c14-9568-c1340762addd] reschedule_update_status_task: canceling existing status update task
2018-10-31 10:45:32.341 7f32f8ff9700 15 rbd::mirror::ImageReplayer: 0x7f333800e4e0 [1/41ddd4e2-5716-4c14-9568-c1340762addd] finish_mirror_image_status_update:
2018-10-31 10:45:32.341 7f32f8ff9700 10 rbd::mirror::ImageReplayer: 0x7f333800e4e0 [1/41ddd4e2-5716-4c14-9568-c1340762addd] shut_down: r=0

Related issues

Copied to rbd - Backport #36692: luminous: [rbd-mirror] forced promotion after killing remote cluster results in stuck state Resolved
Copied to rbd - Backport #36693: mimic: [rbd-mirror] forced promotion after killing remote cluster results in stuck state Resolved

History

#1 Updated by Jason Dillaman over 5 years ago

  • Status changed from In Progress to Fix Under Review

#2 Updated by Jason Dillaman over 5 years ago

New status message:

$ rbd --cluster cluster2 mirror pool status --verbose
health: WARNING
images: 1 total
    1 stopping_replay

image1:
  global_id:   79833db6-58fd-4f58-b013-cba7ed26750e
  state:       up+stopping_replay
  description: force promoted
  last_update: 2018-10-31 14:42:37

#3 Updated by Mykola Golub over 5 years ago

  • Status changed from Fix Under Review to Pending Backport

#4 Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36692: luminous: [rbd-mirror] forced promotion after killing remote cluster results in stuck state added

#5 Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36693: mimic: [rbd-mirror] forced promotion after killing remote cluster results in stuck state added

#6 Updated by Nathan Cutler about 5 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF