Project

General

Profile

Bug #46561

cephadm: monitoring services adoption doesn't honor the container image

Added by Dimitri Savineau about 1 year ago. Updated 10 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
cephadm/monitoring
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When running `cephadm adopt` command against monitoring services then the container image set via [1] isn't honored compared to an initial deployment.

As an example, prometheus container image was using docker.io/prom/prometheus:v2.7.2 before running the below command

# cephadm adopt --cluster ceph --skip-pull --style legacy --name prometheus.mon0

As a result, prometheus is now using the default prometheus container image value (prom/prometheus:latest which is in fact 2.19.2) from [2][3] and not the one set in the ceph configuration.

Looks like those default values are hardcoded and can't be overrided by the adopt command.

BTW those default values aren't the same than the one from in the cephadm orchestrator backend [4].

[1] `ceph config set mgr mgr/cephadm/container_image_xxxx foo/bar:tag` (where xxxx is either alertmanager, grafana, node_exporter or prometheus)
[2] https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L129-L175
[3] https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L1809-L1812
[4] https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/module.py#L183-L202


Related issues

Related to Orchestrator - Feature #47274: cephadm: put the container_image setting into the deployed ceph.conf New
Related to Orchestrator - Feature #45111: cephadm: choose distribution specific images based on etc/os-releaes Rejected
Related to Orchestrator - Bug #45973: Adopted MDS daemons are removed by the orchestrator because they're orphans Rejected
Related to Orchestrator - Bug #50502: cephadm pull doesn't get latest image New
Related to Orchestrator - Feature #45996: adopted prometheus instance uses port 9095, regardless of original port number New

History

#1 Updated by Sebastian Wagner about 1 year ago

hm. honestly don't know if cephadm adopt has the necessary privileges to access the config store. In any case, we're trying to make cephadm adopt someting that doesn't need to talk to the cluster.

would an environment variable be ok for you?

#2 Updated by Dimitri Savineau about 1 year ago

via an environment variable like we have for CEPHADM_IMAGE or a dedicated parameter (like --image) both are fine for me.

#3 Updated by Dimitri Savineau about 1 year ago

I guess it will also help for initial cluster bootstrap.

Because the bootstrap worklfow also use the default values like the adopt command.

As a current workaround, I need to skip the monitoring stack from the bootstrap (--skip-monitoring-stack), then set the monitoring container image variables in the ceph configuration (ceph config set) and finally schedule the monitoring deployment via ceph orch apply.

#4 Updated by Sebastian Wagner about 1 year ago

Dimitri Savineau wrote:

As a current workaround, I need to skip the monitoring stack from the bootstrap (--skip-monitoring-stack)

ceph-salt is always adding --skip-monitoring-stack . The thinking is that it doesn't really make sense to co-locate the monitoring stack with the mon+mgr.

Do you really want to deploy the monitoring stack on the bootstrap host?

#5 Updated by Dimitri Savineau about 1 year ago

The thinking is that it doesn't really make sense to co-locate the monitoring stack with the mon+mgr.

Some people are using this configuration, like OpenStack TripleO

Do you really want to deploy the monitoring stack on the bootstrap host?

Maybe the question is more about why is it enabled by default during the bootstrap ?

Also deploying the monitoring after the bootstrap requires to run an extra ceph command to enable the prometheus mgr module (which is automatically done during the bootstrap) [1]

[1] https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L2877-L2879

#6 Updated by Sebastian Wagner about 1 year ago

Dimitri Savineau wrote:

The thinking is that it doesn't really make sense to co-locate the monitoring stack with the mon+mgr.

Some people are using this configuration, like OpenStack TripleO

I mean, if you're really sure this is a good approach, we can make things more configurable during bootstrap. Of course!

Do you really want to deploy the monitoring stack on the bootstrap host?

Maybe the question is more about why is it enabled by default during the bootstrap ?

Also deploying the monitoring after the bootstrap requires to run an extra ceph command to enable the prometheus mgr module (which is automatically done during the bootstrap) [1]

[1] https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L2877-L2879

this might be worth an extra tracker issue!

#7 Updated by Nathan Cutler about 1 year ago

If it's not useful to deploy the monitoring stack on a MON+MGR node, why does "cephadm bootstrap" do that?

I guess the answer is that, without the monitoring stack, Dashboard doesn't show any graphs. So if you do just "cephadm bootstrap" and then go into the Dashboard, no graphs are displayed. This gives rise to the question: "how do I get graphs?" and I'm not sure there is a clear answer to this anywhere in the documentation. (And I write that fully hoping to be proven wrong!)

#8 Updated by Nathan Cutler about 1 year ago

  • Related to Bug #46606: cephadm: post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued added

#9 Updated by Nathan Cutler about 1 year ago

Also deploying the monitoring after the bootstrap requires to run an extra ceph command to enable the prometheus mgr module (which is automatically done during the bootstrap) [1]

[1] https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L2877-L2879

this might be worth an extra tracker issue!

Here you go: #46606

#10 Updated by Sebastian Wagner about 1 year ago

  • Status changed from New to Need More Info

#11 Updated by Dimitri Savineau about 1 year ago

@Sebastien : What information do you need ?

#12 Updated by Sebastian Wagner 10 months ago

  • Status changed from Need More Info to New

I think we need to remove the hardcoded default images from cephadm and make them somehow configurable.

#13 Updated by Sebastian Wagner 10 months ago

  • Related to Feature #47274: cephadm: put the container_image setting into the deployed ceph.conf added

#14 Updated by Sebastian Wagner 10 months ago

  • Related to Feature #45111: cephadm: choose distribution specific images based on etc/os-releaes added

#15 Updated by Sebastian Wagner 7 months ago

  • Related to Bug #45973: Adopted MDS daemons are removed by the orchestrator because they're orphans added

#16 Updated by Sebastian Wagner 5 months ago

  • Related to Bug #50502: cephadm pull doesn't get latest image added

#17 Updated by Sebastian Wagner 5 months ago

  • Related to deleted (Bug #46606: cephadm: post-bootstrap monitoring deployment only works if the command "ceph mgr module enable prometheus" has already been issued)

#18 Updated by Sebastian Wagner 4 months ago

  • Related to Feature #45996: adopted prometheus instance uses port 9095, regardless of original port number added

Also available in: Atom PDF