Bug #46743
closedmgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster
0%
Description
Currently, "cephadm bootstrap" appears to create a pool because "devicehealth", as an "always on" module, gets created when the first MGR is deployed.
The pool actually gets created by mgr/devicehealth, not by cephadm - hence this bug is opened against mgr/devicehealth, even though - from the user's perspective - the problem happens when the "cephadm bootstrap" command is issued.
Because mgr/devicehealth creates a pool before the cluster has any OSDs, the cluster enters HEALTH_WARN immediately after bootstrap:
master:~ # ceph -s cluster: id: fed46cbe-d157-11ea-901a-52540084b2ce health: HEALTH_WARN OSD count 0 < osd_pool_default_size 3 services: mon: 1 daemons, quorum node1 (age 67s) mgr: node1.ikkrrt(active, since 39s) osd: 0 osds: 0 up, 0 in data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 1 unknown
The pool is:
master:~ # ceph osd pool ls device_health_metrics
It seems like the creation of this pool should be linked to deployment of the first OSD, not to the deployment of the first MON/MGR.
Updated by Nathan Cutler almost 4 years ago
Even when I immediately deploy OSDs after running "cephadm bootstrap", the cluster does not recover from the health warning very quickly. Here's what the health status looks like when deployment of the OSDs completes:
master:~ # ceph -s cluster: id: 59f04650-d15c-11ea-bc40-52540017ea7a health: HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg peering services: mon: 5 daemons, quorum master,node3,node1,node4,node2 (age 35s) mgr: master.oqkwii(active, since 88s), standbys: node2.rwzexl osd: 4 osds: 2 up (since 5s), 2 in (since 5s); 1 remapped pgs data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 2.0 GiB used, 14 GiB / 16 GiB avail pgs: 100.000% pgs not active 1 creating+peering progress: Rebalancing after osd.1 marked in (3s) [............................]
It just doesn't seem right for the cluster to bootstrap into a degraded state, and then have to rely on Ceph's "self-healing" capabilities to put it right.
Updated by Nathan Cutler almost 4 years ago
- Subject changed from Running "cephadm bootstrap" should not create any pools to Running "cephadm bootstrap" without "--apply-spec" should not create any pools
Updated by Sebastian Wagner almost 4 years ago
- Project changed from Orchestrator to mgr
- Subject changed from Running "cephadm bootstrap" without "--apply-spec" should not create any pools to mgr/devicehealth: device_health_metrics gets created even without any OSDs in the cluster
Updated by Nathan Cutler almost 4 years ago
- Subject changed from mgr/devicehealth: device_health_metrics gets created even without any OSDs in the cluster to mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster
Updated by Sunny Kumar over 3 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 37085
Updated by Kefu Chai over 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 3 years ago
- Copied to Backport #47739: octopus: mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster added
Updated by Nathan Cutler over 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".