Project

General

Profile

Actions

Bug #46606

closed

cephadm: post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued

Added by Nathan Cutler almost 4 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
cephadm/monitoring
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued

Reported by Dmitri Savineau here: https://tracker.ceph.com/issues/46561#note-5

deploying the monitoring after the bootstrap requires to run an extra ceph command to enable the prometheus mgr module (which is automatically done during the bootstrap) [1]

[1] https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L2877-L2879

Actions #1

Updated by Nathan Cutler almost 4 years ago

  • Related to Bug #46561: cephadm: monitoring services adoption doesn't honor the container image added
Actions #2

Updated by Nathan Cutler almost 4 years ago

  • Subject changed from Post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued to cephadm: post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued
Actions #3

Updated by Nathan Cutler almost 4 years ago

  • Description updated (diff)
Actions #4

Updated by Nathan Cutler almost 4 years ago

  • Description updated (diff)
Actions #5

Updated by Sebastian Wagner almost 4 years ago

  • Category set to cephadm/monitoring
Actions #6

Updated by Sebastian Wagner about 3 years ago

  • Priority changed from Normal to High
Actions #7

Updated by Juan Miguel Olmo Martínez about 3 years ago

  • Assignee set to Sebastian Wagner
Actions #8

Updated by Sebastian Wagner about 3 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 39520
Actions #9

Updated by Sebastian Wagner about 3 years ago

  • Status changed from Fix Under Review to New
Actions #10

Updated by Sebastian Wagner about 3 years ago

  • Related to deleted (Bug #46561: cephadm: monitoring services adoption doesn't honor the container image)
Actions #11

Updated by Sage Weil almost 3 years ago

A couple options:

- make the 'orch apply prometheus' fail if the mgr prometheus module isn't enabled. (maybe include a --force in case the user really wants to proceed?)
- make cephadm raise a health warning if there is a prometheus deployed but the prometheus module isn't enabled
- make 'orch apply prometheus' silently enable the prometheus module

Actions #12

Updated by Sebastian Wagner almost 3 years ago

I'd definitively go for make 'orch apply prometheus' silently enable the prometheus module.

Actions #13

Updated by Nathan Cutler almost 3 years ago

- make the 'orch apply prometheus' fail if the mgr prometheus module isn't enabled. (maybe include a --force in case the user really wants to proceed?)

This one is slightly problematic because there is not just "orch apply prometheus" with a prometheus-specific yaml blob, but also "orch apply" with a BIG yaml blob (with sections for various kinds of services/daemons).

Arguably, the "orch apply" command (with BIG yaml blob) should fail if any part of the yaml is not fulfillable. But that's not how the orchestrator works: the "orch apply" is fulfilled as a background task and when something goes wrong it's not always obvious to the user how to figure out what happened and why, since it typically involves conducting a post-mortem examination of the mgr logs.

To say it another way: "orch apply" is like a "moon shot". Everything has to be prepared in advance. Once the rocket is on its way up, there isn't any good way of aborting the mission.

(Caveat: this is just my impression as a casual user of "orch apply", not based on any deep knowledge of the code or even the design)

Actions #14

Updated by Sebastian Wagner almost 3 years ago

  • Priority changed from High to Normal

prio=normal, as this is not trivial to implement

Actions #15

Updated by Sebastian Wagner almost 3 years ago

  • Assignee deleted (Sebastian Wagner)
Actions #16

Updated by Sebastian Wagner over 2 years ago

  • Status changed from New to Resolved
  • Pull request ID changed from 39520 to 42682

PR 42682

Actions

Also available in: Atom PDF