Bug #46743: mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster - mgr - Ceph

Actions

Copy link

Bug #46743

closed

mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster

Added by Nathan Cutler over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Sunny Kumar

Category:

Target version:

% Done:

Source:

Tags:

Backport:

octopus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

37085

Crash signature (v1):

Crash signature (v2):

Description

Currently, "cephadm bootstrap" appears to create a pool because "devicehealth", as an "always on" module, gets created when the first MGR is deployed.

The pool actually gets created by mgr/devicehealth, not by cephadm - hence this bug is opened against mgr/devicehealth, even though - from the user's perspective - the problem happens when the "cephadm bootstrap" command is issued.

Because mgr/devicehealth creates a pool before the cluster has any OSDs, the cluster enters HEALTH_WARN immediately after bootstrap:

master:~ # ceph -s
  cluster:
    id:     fed46cbe-d157-11ea-901a-52540084b2ce
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum node1 (age 67s)
    mgr: node1.ikkrrt(active, since 39s)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             1 unknown

The pool is:

 
master:~ # ceph osd pool ls
device_health_metrics

It seems like the creation of this pool should be linked to deployment of the first OSD, not to the deployment of the first MON/MGR.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Nathan Cutler over 3 years ago

Even when I immediately deploy OSDs after running "cephadm bootstrap", the cluster does not recover from the health warning very quickly. Here's what the health status looks like when deployment of the OSDs completes:

master:~ # ceph -s
  cluster:
    id:     59f04650-d15c-11ea-bc40-52540017ea7a
    health: HEALTH_WARN
            Reduced data availability: 1 pg inactive, 1 pg peering

  services:
    mon: 5 daemons, quorum master,node3,node1,node4,node2 (age 35s)
    mgr: master.oqkwii(active, since 88s), standbys: node2.rwzexl
    osd: 4 osds: 2 up (since 5s), 2 in (since 5s); 1 remapped pgs

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   2.0 GiB used, 14 GiB / 16 GiB avail
    pgs:     100.000% pgs not active
             1 creating+peering

  progress:
    Rebalancing after osd.1 marked in (3s)
      [............................]

It just doesn't seem right for the cluster to bootstrap into a degraded state, and then have to rely on Ceph's "self-healing" capabilities to put it right.

Actions

Copy link

Updated by Nathan Cutler over 3 years ago

Subject changed from Running "cephadm bootstrap" should not create any pools to Running "cephadm bootstrap" without "--apply-spec" should not create any pools

Actions

Copy link

Updated by Sebastian Wagner over 3 years ago

Project changed from Orchestrator to mgr
Subject changed from Running "cephadm bootstrap" without "--apply-spec" should not create any pools to mgr/devicehealth: device_health_metrics gets created even without any OSDs in the cluster

Actions

Copy link

Updated by Nathan Cutler over 3 years ago

Subject changed from mgr/devicehealth: device_health_metrics gets created even without any OSDs in the cluster to mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster

Actions

Copy link

Updated by Nathan Cutler over 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Nathan Cutler over 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Neha Ojha over 3 years ago

Tags set to low-hanging-fruit

Actions

Copy link

Updated by Neha Ojha over 3 years ago

Assignee set to Sunny Kumar

Actions

Copy link

Updated by Sunny Kumar over 3 years ago

Status changed from New to In Progress

Actions

Copy link

#10

Updated by Sunny Kumar over 3 years ago

Status changed from In Progress to Fix Under Review
Pull request ID set to 37085

Actions

Copy link

#11

Updated by Nathan Cutler over 3 years ago

Backport set to octopus

Actions

Copy link

#12

Updated by Nathan Cutler over 3 years ago

Tags deleted (~~low-hanging-fruit~~)

Actions

Copy link

#13

Updated by Kefu Chai over 3 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#14

Updated by Nathan Cutler over 3 years ago

Copied to Backport #47739: octopus: mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster added

Actions

Copy link

#15

Updated by Nathan Cutler over 3 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Custom queries

Bug #46743

mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the cluster

Updated by Nathan Cutler over 3 years ago

Updated by Nathan Cutler over 3 years ago

Updated by Sebastian Wagner over 3 years ago

Updated by Nathan Cutler over 3 years ago

Updated by Nathan Cutler over 3 years ago

Updated by Nathan Cutler over 3 years ago

Updated by Neha Ojha over 3 years ago

Updated by Neha Ojha over 3 years ago

Updated by Sunny Kumar over 3 years ago

Updated by Sunny Kumar over 3 years ago

Updated by Nathan Cutler over 3 years ago

Updated by Nathan Cutler over 3 years ago

Updated by Kefu Chai over 3 years ago

Updated by Nathan Cutler over 3 years ago

Updated by Nathan Cutler over 3 years ago