Bug #38077: Marking all OSDs as "out" does not trigger a HEALTH_ERR state - RADOS - Ceph

Actions

Copy link

Bug #38077

open

Marking all OSDs as "out" does not trigger a HEALTH_ERR state

Added by Lenz Grimmer about 5 years ago. Updated about 5 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Administration/Usability

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v14.0.0

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Just tested this on my local 5 OSD dev environment, but this likely applies to any given cluster: when setting the cluster-wide "noup" and "noin" flag to true and marking all OSDs in my cluster as "out" (not "down"), the cluster's health state is still in HEALTH_WARN only (because of the flags, not the OSD status), instead of HEALTH_ERR:

# ./bin/ceph -s
  cluster:
    id:     ed940f7b-187e-4ccf-b1ff-83e068acec95
    health: HEALTH_WARN
            noup,noin flag(s) set

  services:
    mon: 3 daemons, quorum a,b,c (age 2h)
    mgr: x(active, since 59m)
    mds: a:1 {0=b=up:active}, 1 up:standby
    osd: 5 osds: 5 up (since 66m), 0 in (since 2m); 48 remapped pgs
         flags noup,noin

  data:
    pools:   6 pools, 48 pgs
    objects: 51 objects, 6.0 KiB
    usage:   5.3 GiB used, 45 GiB / 50 GiB avail
    pgs:     255/153 objects misplaced (166.667%)
             48 active+clean+remapped

I wonder if OSDs being "out" should be handled similar to OSDs being "down" when it comes to the health state?

Actions

Copy link

Updated by richael zhuang about 5 years ago

Hi,I don't know whether my opinion is right or not, but I think the status should be HEALTH_WARN when OSDs being marked "out", for "out" is manually set and once you reset it as "in",the cluster return to OK.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #38077

Marking all OSDs as "out" does not trigger a HEALTH_ERR state

Updated by richael zhuang about 5 years ago