Project

General

Profile

Bug #57969

monitor: ceph -s shows all monitors out of quorum for < 1s

Added by Kamoltat (Junior) Sirivadhna 3 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
low-hanging-fruit
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph -s UI shows all monitors out of quorum for a very short time < 1s.
Issue is like to have no real effect on the cluster but this could potential confuse the user
and trigger false alarm.

First recorded observation when:

Stretch-Cluster, 5 MONs 4 OSDs, 2 stretch buckets

The problem is not very deterministic, recommend using a watch command on ceph -s when trying to reproduce.

Fail 1 zone

  cluster:
    id:     385a428b-c9a6-475a-83f7-172fc2e9973a
    health: HEALTH_WARN
            We are missing stretch mode buckets, only requiring 1 of 2 buckets to peer
            2/5 mons down, quorum a,b,e
            2 osds down
            2 hosts (2 osds) down
            1 zone (2 osds) down

  services:
    mon: 5 daemons, quorum  (age 18h), out of quorum: a, b, e, f, g
    mgr: a(active, since 55m), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 4 osds: 2 up (since 0.224087s), 4 in (since 23h)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   11 pools, 177 pgs
    objects: 553 objects, 324 MiB
    usage:   2.7 GiB used, 397 GiB / 400 GiB avail
    pgs:     177 active+clean

Also available in: Atom PDF