Actions
Bug #56666
closedmds: standby-replay daemon always removed in MDSMonitor::prepare_beacon
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Development
Tags:
backport_processed
Backport:
quincy,pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
If a standby-replay daemon's beacon makes it to MDSMonitor::prepare_beacon (rarely), it's automatically removed by the monitors:
2022-07-21T20:10:11.114+0000 7fdd8d195700 7 mon.a@0(leader).mds e10 prepare_update mdsbeacon(4232/d up:standby-replay seq=30 v10) v8 2022-07-21T20:10:11.114+0000 7fdd8d195700 10 mon.a@0(leader).mds e10 MDS health message (mds.?): HEALTH_ERR Metadata damage detected 2022-07-21T20:10:11.114+0000 7fdd8d195700 4 mon.a@0(leader).mds e10 mds_beacon MDS can't go back into standby after taking rank: held rank 0 while requesting state up:standby-replay 2022-07-21T20:10:11.114+0000 7fdd8d195700 1 mon.a@0(leader).mds e10 fail_mds_gid 4232 mds.d role 0
This is with a synthetic health warning injected into the beacon.
The broken code is:
A standby-replay daemon always has a rank. This check is wrong.
Actions