Bug #57998
closedcephadm stuck trying to download "mon"
0%
Description
Entire cluster cephadm management is stuck and repeatedly tries to download an unqualified "mon" instead of the ceph base image. The config shows mon mapped to quay.io/ceph/ceph despite this happening.
cephadm exited with an error code: 1, stderr:Pulling container image mon... Non-zero exit code 125 from /bin/podman pull mon /bin/podman: stderr Resolving "mon" using unqualified-search registries (/etc/containers/registries.conf) /bin/podman: stderr Trying to pull registry.access.redhat.com/mon:latest... /bin/podman: stderr Trying to pull registry.redhat.io/mon:latest... /bin/podman: stderr Trying to pull docker.io/library/mon:latest... /bin/podman: stderr Error: 3 errors occurred while pulling: /bin/podman: stderr * initializing source docker://registry.access.redhat.com/mon:latest: reading manifest latest in registry.access.redhat.com/mon: name unknown: Repo not found /bin/podman: stderr * initializing source docker://registry.redhat.io/mon:latest: unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication /bin/podman: stderr * initializing source docker://mon:latest: reading manifest latest in docker.io/library/mon: errors: /bin/podman: stderr denied: requested access to the resource is denied /bin/podman: stderr unauthorized: authentication required /bin/podman: stderr ERROR: Failed to pull container image. Check that host(s) are logged into the registry Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1429, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1326, in _run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Pulling container image mon... Non-zero exit code 125 from /bin/podman pull mon /bin/podman: stderr Resolving "mon" using unqualified-search registries (/etc/containers/registries.conf) /bin/podman: stderr Trying to pull registry.access.redhat.com/mon:latest... /bin/podman: stderr Trying to pull registry.redhat.io/mon:latest... /bin/podman: stderr Trying to pull docker.io/library/mon:latest... /bin/podman: stderr Error: 3 errors occurred while pulling: /bin/podman: stderr * initializing source docker://registry.access.redhat.com/mon:latest: reading manifest latest in registry.access.redhat.com/mon: name unknown: Repo not found /bin/podman: stderr * initializing source docker://registry.redhat.io/mon:latest: unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication /bin/podman: stderr * initializing source docker://mon:latest: reading manifest latest in docker.io/library/mon: errors: /bin/podman: stderr denied: requested access to the resource is denied /bin/podman: stderr unauthorized: authentication required /bin/podman: stderr ERROR: Failed to pull container image. Check that host(s) are logged into the registry
Updated by Adam King over 1 year ago
hmm, can I see what "ceph config dump" spits out (feel free to remove anything sensitive if if necessary)? All the images used come from config options iirc so it's likely something there even if the global container_image setting is set correctly.
Updated by Redouane Kachach Elhichou over 1 year ago
- Status changed from New to Need More Info
Updated by Shawn Iverson over 1 year ago
Here's the dump, there is definitely something fishy here, how do I remove it?
global advanced cluster_network 10.0.0.0/24 * global basic container_image quay.io/ceph/ceph@sha256:bdd00e177be2216fe36e605cbd1f32b70c9e3c8285bd66adba005c5e6f71de6b * mon advanced auth_allow_insecure_global_id_reclaim false mon advanced public_network 10.102.12.0/23 * mon.ceph01san01mon01 basic container_image mon * mgr advanced mgr/cephadm/container_init True * mgr advanced mgr/cephadm/migration_current 5 * mgr advanced mgr/dashboard/ALERTMANAGER_API_HOST http://host.containers.internal:9093 * mgr advanced mgr/dashboard/FEATURE_TOGGLE_ISCSI false * mgr advanced mgr/dashboard/GRAFANA_API_SSL_VERIFY false * mgr advanced mgr/dashboard/GRAFANA_API_URL https://ceph01san01.dev.ena.net:3100 * mgr advanced mgr/dashboard/PROMETHEUS_API_HOST http://host.containers.internal:9095 * mgr unknown mgr/dashboard/redirect_resolve_ip_addr True * mgr advanced mgr/dashboard/ssl false * mgr advanced mgr/dashboard/ssl_server_port 8443 * mgr advanced mgr/dashboard/standby_behaviour error * mgr advanced mgr/orchestrator/orchestrator cephadm osd advanced osd_memory_target_autotune true
Updated by Shawn Iverson over 1 year ago
I executed the following:
ceph config rm mon.ceph01san01mon01 container_image
and now cephadmn is working!
Updated by Adam King over 1 year ago
- Status changed from Need More Info to Resolved
Shawn Iverson wrote:
I executed the following:
[...]
and now cephadmn is working!
was away the last couple weeks and didn't see your update, but glad it's working now. Was definitely that config option causing the issue. Not sure how that got set like that, but unless we see this more, going to assume it was just a mistaken command somewhere in the past and consider this resolved.