Bug #38942: HEALTH_OK is reported with no managers (or OSDs) in the cluster - mgr - Ceph

Actions

Copy link

Bug #38942

closed

HEALTH_OK is reported with no managers (or OSDs) in the cluster

Added by Alfredo Deza about 5 years ago. Updated about 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Neha Ojha

Category:

Target version:

% Done:

Source:

Tags:

Backport:

nautilus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

33025

Crash signature (v1):

Crash signature (v2):

Description

Using the latest Nautilus release (14.2.0) we are seeing the following:

+ oc --context=91 -n rook-ceph exec -it rook-ceph-tools-774c55f44f-wj9th -- ceph -s
Unable to use a TTY - input is not a terminal or the right kind of file
  cluster:
    id:     97ce8ce8-811c-46ce-9682-ce535d9859ab
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 11m)
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

Since Ceph requires a MGR deamon, this scenario should be treated as a warning at the very least.

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Updated by Alfredo Deza about 5 years ago

Description updated (diff)

Actions

Copy link

Updated by Greg Farnum about 5 years ago

Project changed from RADOS to mgr
Subject changed from HEALTH_OK is reported with no managers or OSDs in the cluster to HEALTH_OK is reported with no managers (or OSDs) in the cluster

The MgrMonitor has some code from https://github.com/ceph/ceph/commit/b9cdb9fa7bef1bb4b93712293fddac3f1c52b26e that deliberately keeps HEALTH_OK on new monitors without a manager. Seems like it now defaults to 2 minutes, presumably so that you don't get HEALTH_WARN as soon as you turn on a cluster.
Not sure if you want some kind of option to change that behavior further, or if it was a mistaken attempt to stay user-friendly that isn't working, or something else.

Actions

Copy link

Updated by Alfredo Deza about 5 years ago

I don't see a problem with getting HEALTH_WARN as soon as a cluster is deployed (vs. reporting HEALTH_OK blindly while it waits).

The issue is also extended to no OSDs present in the cluster, which has to be an error state.

ceph-medic is now adding a check so this isn't treated as a false positive: https://github.com/ceph/ceph-medic/issues/94

But the underlying thing here is that Ceph shouldn't report HEALTH_OK in this case.

Actions

Copy link

Updated by Ken Dreyer about 5 years ago

I think it would be good to report HEALTH_WARN immediately when there are no mgrs or OSDs present instead of waiting.

Actions

Copy link

Updated by Sebastian Wagner almost 5 years ago

We need to support clsters without OSDs, as this will be the entry point for new clusters that are going to be deployed with the orchestrators. Basically starting from a minimal MON+MGR cluster.

AFAIK some teuthology tests don't require any MGRs, right?

Actions

Copy link

Updated by Neha Ojha about 4 years ago

Status changed from New to Fix Under Review
Assignee set to Neha Ojha
Pull request ID set to 33025

No OSDs initially should not be a warning, but OSDs with no mgr, can be a warning.

Actions

Copy link

Updated by Neha Ojha about 4 years ago

Backport set to nautilus

Actions

Copy link

Updated by Neha Ojha about 4 years ago

Related to Feature #39302: `ceph df` reports misleading information when no ceph-mgr running added

Actions

Copy link

Updated by Kefu Chai about 4 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#10

Updated by Nathan Cutler about 4 years ago

Copied to Backport #44000: nautilus: HEALTH_OK is reported with no managers (or OSDs) in the cluster added

Actions

Copy link

#11

Updated by Nathan Cutler about 4 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Custom queries

Bug #38942

HEALTH_OK is reported with no managers (or OSDs) in the cluster

Updated by Alfredo Deza about 5 years ago

Updated by Greg Farnum about 5 years ago

Updated by Alfredo Deza about 5 years ago

Updated by Ken Dreyer about 5 years ago

Updated by Sebastian Wagner almost 5 years ago

Updated by Neha Ojha about 4 years ago

Updated by Neha Ojha about 4 years ago

Updated by Neha Ojha about 4 years ago

Updated by Kefu Chai about 4 years ago

Updated by Nathan Cutler about 4 years ago

Updated by Nathan Cutler about 4 years ago