HEALTH_OK is reported with no managers (or OSDs) in the cluster
Using the latest Nautilus release (14.2.0) we are seeing the following:
+ oc --context=91 -n rook-ceph exec -it rook-ceph-tools-774c55f44f-wj9th -- ceph -s Unable to use a TTY - input is not a terminal or the right kind of file cluster: id: 97ce8ce8-811c-46ce-9682-ce535d9859ab health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 11m) mgr: no daemons active osd: 0 osds: 0 up, 0 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs:
Since Ceph requires a MGR deamon, this scenario should be treated as a warning at the very least.
#2 Updated by Greg Farnum 5 months ago
- Project changed from RADOS to mgr
- Subject changed from HEALTH_OK is reported with no managers or OSDs in the cluster to HEALTH_OK is reported with no managers (or OSDs) in the cluster
The MgrMonitor has some code from https://github.com/ceph/ceph/commit/b9cdb9fa7bef1bb4b93712293fddac3f1c52b26e that deliberately keeps HEALTH_OK on new monitors without a manager. Seems like it now defaults to 2 minutes, presumably so that you don't get HEALTH_WARN as soon as you turn on a cluster.
Not sure if you want some kind of option to change that behavior further, or if it was a mistaken attempt to stay user-friendly that isn't working, or something else.
#3 Updated by Alfredo Deza 5 months ago
I don't see a problem with getting HEALTH_WARN as soon as a cluster is deployed (vs. reporting HEALTH_OK blindly while it waits).
The issue is also extended to no OSDs present in the cluster, which has to be an error state.
ceph-medic is now adding a check so this isn't treated as a false positive: https://github.com/ceph/ceph-medic/issues/94
But the underlying thing here is that Ceph shouldn't report HEALTH_OK in this case.