Bug #46606: cephadm: post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued - Orchestrator - Ceph

Actions

Copy link

Bug #46606

closed

cephadm: post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued

Added by Nathan Cutler almost 4 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

cephadm/monitoring

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

42682

Crash signature (v1):

Crash signature (v2):

Description

Post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued

Reported by Dmitri Savineau here: https://tracker.ceph.com/issues/46561#note-5

deploying the monitoring after the bootstrap requires to run an extra ceph command to enable the prometheus mgr module (which is automatically done during the bootstrap) [1]

[1] https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L2877-L2879

Actions

Copy link

Updated by Nathan Cutler almost 4 years ago

Related to Bug #46561: cephadm: monitoring services adoption doesn't honor the container image added

Actions

Copy link

Updated by Nathan Cutler almost 4 years ago

Subject changed from Post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued to cephadm: post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued

Actions

Copy link

Updated by Nathan Cutler almost 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Nathan Cutler almost 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

Category set to cephadm/monitoring

Actions

Copy link

Updated by Sebastian Wagner about 3 years ago

Priority changed from Normal to High

Actions

Copy link

Updated by Juan Miguel Olmo Martínez about 3 years ago

Assignee set to Sebastian Wagner

Actions

Copy link

Updated by Sebastian Wagner about 3 years ago

Status changed from New to Fix Under Review
Pull request ID set to 39520

Actions

Copy link

Updated by Sebastian Wagner about 3 years ago

Status changed from Fix Under Review to New

Actions

Copy link

#10

Updated by Sebastian Wagner about 3 years ago

Related to deleted (Bug #46561: cephadm: monitoring services adoption doesn't honor the container image)

Actions

Copy link

#11

Updated by Sage Weil almost 3 years ago

A couple options:

- make the 'orch apply prometheus' fail if the mgr prometheus module isn't enabled. (maybe include a --force in case the user really wants to proceed?)
- make cephadm raise a health warning if there is a prometheus deployed but the prometheus module isn't enabled
- make 'orch apply prometheus' silently enable the prometheus module

Actions

Copy link

#12

Updated by Sebastian Wagner almost 3 years ago

I'd definitively go for make 'orch apply prometheus' silently enable the prometheus module.

Actions

Copy link

#13

Updated by Nathan Cutler almost 3 years ago

- make the 'orch apply prometheus' fail if the mgr prometheus module isn't enabled. (maybe include a --force in case the user really wants to proceed?)

This one is slightly problematic because there is not just "orch apply prometheus" with a prometheus-specific yaml blob, but also "orch apply" with a BIG yaml blob (with sections for various kinds of services/daemons).

Arguably, the "orch apply" command (with BIG yaml blob) should fail if any part of the yaml is not fulfillable. But that's not how the orchestrator works: the "orch apply" is fulfilled as a background task and when something goes wrong it's not always obvious to the user how to figure out what happened and why, since it typically involves conducting a post-mortem examination of the mgr logs.

To say it another way: "orch apply" is like a "moon shot". Everything has to be prepared in advance. Once the rocket is on its way up, there isn't any good way of aborting the mission.

(Caveat: this is just my impression as a casual user of "orch apply", not based on any deep knowledge of the code or even the design)

Actions

Copy link

#14