Project

General

Profile

Actions

Bug #46743

closed

mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster

Added by Nathan Cutler over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Currently, "cephadm bootstrap" appears to create a pool because "devicehealth", as an "always on" module, gets created when the first MGR is deployed.

The pool actually gets created by mgr/devicehealth, not by cephadm - hence this bug is opened against mgr/devicehealth, even though - from the user's perspective - the problem happens when the "cephadm bootstrap" command is issued.

Because mgr/devicehealth creates a pool before the cluster has any OSDs, the cluster enters HEALTH_WARN immediately after bootstrap:

master:~ # ceph -s
  cluster:
    id:     fed46cbe-d157-11ea-901a-52540084b2ce
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum node1 (age 67s)
    mgr: node1.ikkrrt(active, since 39s)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             1 unknown

The pool is:

 
master:~ # ceph osd pool ls
device_health_metrics

It seems like the creation of this pool should be linked to deployment of the first OSD, not to the deployment of the first MON/MGR.


Related issues 1 (0 open1 closed)

Copied to mgr - Backport #47739: octopus: mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the clusterResolvedNathan CutlerActions
Actions #1

Updated by Nathan Cutler over 3 years ago

Even when I immediately deploy OSDs after running "cephadm bootstrap", the cluster does not recover from the health warning very quickly. Here's what the health status looks like when deployment of the OSDs completes:

master:~ # ceph -s
  cluster:
    id:     59f04650-d15c-11ea-bc40-52540017ea7a
    health: HEALTH_WARN
            Reduced data availability: 1 pg inactive, 1 pg peering

  services:
    mon: 5 daemons, quorum master,node3,node1,node4,node2 (age 35s)
    mgr: master.oqkwii(active, since 88s), standbys: node2.rwzexl
    osd: 4 osds: 2 up (since 5s), 2 in (since 5s); 1 remapped pgs

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   2.0 GiB used, 14 GiB / 16 GiB avail
    pgs:     100.000% pgs not active
             1 creating+peering

  progress:
    Rebalancing after osd.1 marked in (3s)
      [............................] 

It just doesn't seem right for the cluster to bootstrap into a degraded state, and then have to rely on Ceph's "self-healing" capabilities to put it right.

Actions #2

Updated by Nathan Cutler over 3 years ago

  • Subject changed from Running "cephadm bootstrap" should not create any pools to Running "cephadm bootstrap" without "--apply-spec" should not create any pools
Actions #3

Updated by Sebastian Wagner over 3 years ago

  • Project changed from Orchestrator to mgr
  • Subject changed from Running "cephadm bootstrap" without "--apply-spec" should not create any pools to mgr/devicehealth: device_health_metrics gets created even without any OSDs in the cluster
Actions #4

Updated by Nathan Cutler over 3 years ago

  • Subject changed from mgr/devicehealth: device_health_metrics gets created even without any OSDs in the cluster to mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster
Actions #5

Updated by Nathan Cutler over 3 years ago

  • Description updated (diff)
Actions #6

Updated by Nathan Cutler over 3 years ago

  • Description updated (diff)
Actions #7

Updated by Neha Ojha over 3 years ago

  • Tags set to low-hanging-fruit
Actions #8

Updated by Neha Ojha over 3 years ago

  • Assignee set to Sunny Kumar
Actions #9

Updated by Sunny Kumar over 3 years ago

  • Status changed from New to In Progress
Actions #10

Updated by Sunny Kumar over 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 37085
Actions #11

Updated by Nathan Cutler over 3 years ago

  • Backport set to octopus
Actions #12

Updated by Nathan Cutler over 3 years ago

  • Tags deleted (low-hanging-fruit)
Actions #13

Updated by Kefu Chai over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #14

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47739: octopus: mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster added
Actions #15

Updated by Nathan Cutler over 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF