Project

General

Profile

Bug #52565

MDSMonitor: handle damaged state from standby-replay

Added by Patrick Donnelly over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After the addition of join_fscid, the state change validation in this code:

https://github.com/ceph/ceph/blob/ac460ffdd57fcb3e0df3ae638d12b15cfaa9cb48/src/mon/MDSMonitor.cc#L477-L482

may not run. This may cause the rank to be marked damaged but the code does not properly handle the standby-replay daemon marking the rank damaged. It results in an assertion in sanity checks when encoding pending: the rank is both in "up" and "damaged".


Related issues

Copied to CephFS - Backport #52639: pacific: MDSMonitor: handle damaged state from standby-replay Resolved

History

#1 Updated by Patrick Donnelly over 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 43122

#2 Updated by Patrick Donnelly over 2 years ago

  • Status changed from Fix Under Review to Pending Backport

#3 Updated by Backport Bot over 2 years ago

  • Copied to Backport #52639: pacific: MDSMonitor: handle damaged state from standby-replay added

#4 Updated by Loïc Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF