mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster
Currently, "cephadm bootstrap" appears to create a pool because "devicehealth", as an "always on" module, gets created when the first MGR is deployed.
The pool actually gets created by mgr/devicehealth, not by cephadm - hence this bug is opened against mgr/devicehealth, even though - from the user's perspective - the problem happens when the "cephadm bootstrap" command is issued.
Because mgr/devicehealth creates a pool before the cluster has any OSDs, the cluster enters HEALTH_WARN immediately after bootstrap:
master:~ # ceph -s cluster: id: fed46cbe-d157-11ea-901a-52540084b2ce health: HEALTH_WARN OSD count 0 < osd_pool_default_size 3 services: mon: 1 daemons, quorum node1 (age 67s) mgr: node1.ikkrrt(active, since 39s) osd: 0 osds: 0 up, 0 in data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 1 unknown
The pool is:
master:~ # ceph osd pool ls device_health_metrics
It seems like the creation of this pool should be linked to deployment of the first OSD, not to the deployment of the first MON/MGR.
#1 Updated by Nathan Cutler 6 months ago
Even when I immediately deploy OSDs after running "cephadm bootstrap", the cluster does not recover from the health warning very quickly. Here's what the health status looks like when deployment of the OSDs completes:
master:~ # ceph -s cluster: id: 59f04650-d15c-11ea-bc40-52540017ea7a health: HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg peering services: mon: 5 daemons, quorum master,node3,node1,node4,node2 (age 35s) mgr: master.oqkwii(active, since 88s), standbys: node2.rwzexl osd: 4 osds: 2 up (since 5s), 2 in (since 5s); 1 remapped pgs data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 2.0 GiB used, 14 GiB / 16 GiB avail pgs: 100.000% pgs not active 1 creating+peering progress: Rebalancing after osd.1 marked in (3s) [............................]
It just doesn't seem right for the cluster to bootstrap into a degraded state, and then have to rely on Ceph's "self-healing" capabilities to put it right.