Bug #49371
openMisleading alarm if all MDS daemons have failed
0%
Description
Seen on ceph v14.2.9 in a containerised cluster with 3 MDS nodes
Both standby MGR containers are manually stopped. ceph reports a sensible alarm:
With only 1 MDS remaining we have an alarm on ceph health:
health: HEALTH_WARN
insufficient standby MDS daemons available
Then I manually stop the final, active MDS damon.
Expected:
`ceph health` should report an alarm that there are no active MDS daemons and all filesystems are degraded / inactive.
Actual:
`ceph health` continues to report "insufficent standby". There are no new alarms about the total lack of active MDS daemons.
health: HEALTH_WARN
insufficient standby MDS daemons available
ceph status shows:
mds: cephfs:1 {0=albamons_sc2=up:active(laggy or crashed)}
If I then stop the active (and only remaining) MGR, we got an alarm reported on ceph health:
health: HEALTH_WARN
no active mgr
Updated by David Piper about 3 years ago
Sorry - please ignore the references to MGR in the description. The issue here is just with alarms about MDS when all MDS daemons are inactive.
Updated by Sebastian Wagner about 3 years ago
- Has duplicate Bug #49370: No alarm if all standby MDSs have failed added
Updated by Sebastian Wagner about 3 years ago
- Project changed from Ceph to CephFS
Updated by Patrick Donnelly about 3 years ago
- Status changed from New to Triaged
- Assignee set to Patrick Donnelly
- Priority changed from Normal to High
- Target version set to v17.0.0
- Source set to Community (user)
- Backport set to pacific,octopus
- Component(FS) MDSMonitor added
Thanks for the report. That is indeed confusing. I think we will change it so laggy/dead daemons are still removed by the mons. That would generate the appropriate health warning.