Project

General

Profile

Bug #61553

cephadm does not honor container_image default value

Added by Daniel Krambrock 10 months ago. Updated 9 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
ceph-mgr
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Already posted this on :

Redeploying a container image (tested with alertmanager) after removing a custom `mgr/cephadm/container_image_alertmanager` value, deploys the previous container image and not the default container image.

I'm running `cephadm` from ubuntu 22.04 pkg 17.2.5-0ubuntu0.22.04.3 and `ceph` version 17.2.6.

Here is an example. Node clrz20-08 is the node altermanager is running on, clrz20-01 the node I'm controlling ceph from:

  • Get alertmanager version
    root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name'
    "quay.io/prometheus/alertmanager:v0.23.0" 
    
  • Set alertmanager image
    root@clrz20-01:~# ceph config set mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager
    root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager
    quay.io/prometheus/alertmanager
    
  • redeploy altermanager
    root@clrz20-01:~# ceph orch redeploy alertmanager
    Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
    
  • Get alertmanager version
    root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name'
    "quay.io/prometheus/alertmanager:latest" 
    
  • Remove alertmanager image setting, revert to default:
    root@clrz20-01:~# ceph config rm mgr mgr/cephadm/container_image_alertmanager
    root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager
    quay.io/prometheus/alertmanager:v0.23.0
    
  • redeploy altermanager
    root@clrz20-01:~# ceph orch redeploy alertmanager
    Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
    
  • Get alertmanager version
    root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name'
    "quay.io/prometheus/alertmanager:latest" 
    

    -> `mgr/cephadm/container_image_alertmanager` is set to `quay.io/prometheus/alertmanager:v0.23.0`, but redeploy uses `quay.io/prometheus/alertmanager:latest`. This looks like a bug.
  • Set alertmanager image explicitly to the default value
    root@clrz20-01:~# ceph config set mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager:v0.23.0
    root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager
    quay.io/prometheus/alertmanager:v0.23.0
    
  • redeploy altermanager
    root@clrz20-01:~# ceph orch redeploy alertmanager
    Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
    
  • Get alertmanager version
    root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name'
    "quay.io/prometheus/alertmanager:v0.23.0" 
    

    -> Setting `mgr/cephadm/container_image_alertmanager` to the default setting fixes the issue.

History

#1 Updated by Redouane Kachach Elhichou 10 months ago

  • Assignee set to Redouane Kachach Elhichou

#2 Updated by Redouane Kachach Elhichou 10 months ago

As of my preliminary investigation it seems that:

 ceph config rm mgr mgr/cephadm/xxx

clears the config param correctly in the mon store (as ceph config returns default value) but regardless of the configuration option it doesn't propagate that to the mgr (binary) because from cephadm module, calls to self.get_module_option(xxx) still returns the old value instead of the default.

PD: as reported in the BUG if instead of rm we just use ceph config set xxx then it works correctly.

#3 Updated by Redouane Kachach Elhichou 10 months ago

  • Priority changed from Normal to Low

#4 Updated by Redouane Kachach Elhichou 9 months ago

  • Project changed from Orchestrator to mgr
  • Category set to ceph-mgr
  • Assignee deleted (Redouane Kachach Elhichou)

As of my investigation it seems a BUG on the mgr (I'd say c++ code) since it doesn't propagate removal of config options to the python modules.

Also available in: Atom PDF