Bug #49371: Misleading alarm if all MDS daemons have failed - CephFS - Ceph

Actions

Copy link

Bug #49371

open

Misleading alarm if all MDS daemons have failed

Added by David Piper about 3 years ago. Updated almost 2 years ago.

Status:

Triaged

Priority:

High

Assignee:

Patrick Donnelly

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

pacific,octopus

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDSMonitor

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Seen on ceph v14.2.9 in a containerised cluster with 3 MDS nodes

Both standby MGR containers are manually stopped. ceph reports a sensible alarm:

With only 1 MDS remaining we have an alarm on ceph health:

health: HEALTH_WARN
            insufficient standby MDS daemons available

Then I manually stop the final, active MDS damon.

Expected:

`ceph health` should report an alarm that there are no active MDS daemons and all filesystems are degraded / inactive.

Actual:

`ceph health` continues to report "insufficent standby". There are no new alarms about the total lack of active MDS daemons.

health: HEALTH_WARN
            insufficient standby MDS daemons available

ceph status shows:

mds: cephfs:1 {0=albamons_sc2=up:active(laggy or crashed)}

If I then stop the active (and only remaining) MGR, we got an alarm reported on ceph health:

health: HEALTH_WARN
no active mgr

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #49371

Misleading alarm if all MDS daemons have failed

Updated by David Piper about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Patrick Donnelly almost 2 years ago