Bug #61553
cephadm does not honor container_image default value
0%
Description
Already posted this on ceph-users@ceph.io:
Redeploying a container image (tested with alertmanager) after removing a custom `mgr/cephadm/container_image_alertmanager` value, deploys the previous container image and not the default container image.
I'm running `cephadm` from ubuntu 22.04 pkg 17.2.5-0ubuntu0.22.04.3 and `ceph` version 17.2.6.
Here is an example. Node clrz20-08 is the node altermanager is running on, clrz20-01 the node I'm controlling ceph from:
- Get alertmanager version
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:v0.23.0"
- Set alertmanager image
root@clrz20-01:~# ceph config set mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager
- redeploy altermanager
root@clrz20-01:~# ceph orch redeploy alertmanager Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
- Get alertmanager version
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:latest"
- Remove alertmanager image setting, revert to default:
root@clrz20-01:~# ceph config rm mgr mgr/cephadm/container_image_alertmanager root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager:v0.23.0
- redeploy altermanager
root@clrz20-01:~# ceph orch redeploy alertmanager Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
- Get alertmanager version
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:latest"
-> `mgr/cephadm/container_image_alertmanager` is set to `quay.io/prometheus/alertmanager:v0.23.0`, but redeploy uses `quay.io/prometheus/alertmanager:latest`. This looks like a bug.
- Set alertmanager image explicitly to the default value
root@clrz20-01:~# ceph config set mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager:v0.23.0 root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager:v0.23.0
- redeploy altermanager
root@clrz20-01:~# ceph orch redeploy alertmanager Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
- Get alertmanager version
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:v0.23.0"
-> Setting `mgr/cephadm/container_image_alertmanager` to the default setting fixes the issue.
History
#1 Updated by Redouane Kachach Elhichou 10 months ago
- Assignee set to Redouane Kachach Elhichou
#2 Updated by Redouane Kachach Elhichou 10 months ago
As of my preliminary investigation it seems that:
ceph config rm mgr mgr/cephadm/xxx
clears the config param correctly in the mon store (as ceph config returns default value) but regardless of the configuration option it doesn't propagate that to the mgr (binary) because from cephadm module, calls to self.get_module_option(xxx) still returns the old value instead of the default.
PD: as reported in the BUG if instead of rm we just use ceph config set xxx then it works correctly.
#3 Updated by Redouane Kachach Elhichou 10 months ago
- Priority changed from Normal to Low
#4 Updated by Redouane Kachach Elhichou 9 months ago
- Project changed from Orchestrator to mgr
- Category set to ceph-mgr
- Assignee deleted (
Redouane Kachach Elhichou)
As of my investigation it seems a BUG on the mgr (I'd say c++ code) since it doesn't propagate removal of config options to the python modules.