Project

General

Profile

Actions

Bug #49371

open

Misleading alarm if all MDS daemons have failed

Added by David Piper about 3 years ago. Updated almost 2 years ago.

Status:
Triaged
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Seen on ceph v14.2.9 in a containerised cluster with 3 MDS nodes

Both standby MGR containers are manually stopped. ceph reports a sensible alarm:

With only 1 MDS remaining we have an alarm on ceph health:

health: HEALTH_WARN
insufficient standby MDS daemons available

Then I manually stop the final, active MDS damon.

Expected:

`ceph health` should report an alarm that there are no active MDS daemons and all filesystems are degraded / inactive.

Actual:

`ceph health` continues to report "insufficent standby". There are no new alarms about the total lack of active MDS daemons.

health: HEALTH_WARN
insufficient standby MDS daemons available

ceph status shows:

mds: cephfs:1 {0=albamons_sc2=up:active(laggy or crashed)}

If I then stop the active (and only remaining) MGR, we got an alarm reported on ceph health:

health: HEALTH_WARN
no active mgr


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #49370: No alarm if all standby MDSs have failedDuplicate

Actions
Actions

Also available in: Atom PDF