Project

General

Profile

Bug #46743

mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster

Added by Nathan Cutler 6 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Currently, "cephadm bootstrap" appears to create a pool because "devicehealth", as an "always on" module, gets created when the first MGR is deployed.

The pool actually gets created by mgr/devicehealth, not by cephadm - hence this bug is opened against mgr/devicehealth, even though - from the user's perspective - the problem happens when the "cephadm bootstrap" command is issued.

Because mgr/devicehealth creates a pool before the cluster has any OSDs, the cluster enters HEALTH_WARN immediately after bootstrap:

master:~ # ceph -s
  cluster:
    id:     fed46cbe-d157-11ea-901a-52540084b2ce
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum node1 (age 67s)
    mgr: node1.ikkrrt(active, since 39s)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             1 unknown

The pool is:

 
master:~ # ceph osd pool ls
device_health_metrics

It seems like the creation of this pool should be linked to deployment of the first OSD, not to the deployment of the first MON/MGR.


Related issues

Copied to mgr - Backport #47739: octopus: mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster Resolved

History

#1 Updated by Nathan Cutler 6 months ago

Even when I immediately deploy OSDs after running "cephadm bootstrap", the cluster does not recover from the health warning very quickly. Here's what the health status looks like when deployment of the OSDs completes:

master:~ # ceph -s
  cluster:
    id:     59f04650-d15c-11ea-bc40-52540017ea7a
    health: HEALTH_WARN
            Reduced data availability: 1 pg inactive, 1 pg peering

  services:
    mon: 5 daemons, quorum master,node3,node1,node4,node2 (age 35s)
    mgr: master.oqkwii(active, since 88s), standbys: node2.rwzexl
    osd: 4 osds: 2 up (since 5s), 2 in (since 5s); 1 remapped pgs

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   2.0 GiB used, 14 GiB / 16 GiB avail
    pgs:     100.000% pgs not active
             1 creating+peering

  progress:
    Rebalancing after osd.1 marked in (3s)
      [............................] 

It just doesn't seem right for the cluster to bootstrap into a degraded state, and then have to rely on Ceph's "self-healing" capabilities to put it right.

#2 Updated by Nathan Cutler 6 months ago

  • Subject changed from Running "cephadm bootstrap" should not create any pools to Running "cephadm bootstrap" without "--apply-spec" should not create any pools

#3 Updated by Sebastian Wagner 6 months ago

  • Project changed from Orchestrator to mgr
  • Subject changed from Running "cephadm bootstrap" without "--apply-spec" should not create any pools to mgr/devicehealth: device_health_metrics gets created even without any OSDs in the cluster

#4 Updated by Nathan Cutler 6 months ago

  • Subject changed from mgr/devicehealth: device_health_metrics gets created even without any OSDs in the cluster to mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster

#5 Updated by Nathan Cutler 6 months ago

  • Description updated (diff)

#6 Updated by Nathan Cutler 6 months ago

  • Description updated (diff)

#7 Updated by Neha Ojha 6 months ago

  • Tags set to low-hanging-fruit

#8 Updated by Neha Ojha 6 months ago

  • Assignee set to Sunny Kumar

#9 Updated by Sunny Kumar 5 months ago

  • Status changed from New to In Progress

#10 Updated by Sunny Kumar 5 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 37085

#11 Updated by Nathan Cutler 4 months ago

  • Backport set to octopus

#12 Updated by Nathan Cutler 4 months ago

  • Tags deleted (low-hanging-fruit)

#13 Updated by Kefu Chai 4 months ago

  • Status changed from Fix Under Review to Pending Backport

#14 Updated by Nathan Cutler 4 months ago

  • Copied to Backport #47739: octopus: mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster added

#15 Updated by Nathan Cutler 4 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF