Project

General

Profile

Actions

Bug #57998

closed

cephadm stuck trying to download "mon"

Added by Shawn Iverson over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Entire cluster cephadm management is stuck and repeatedly tries to download an unqualified "mon" instead of the ceph base image. The config shows mon mapped to quay.io/ceph/ceph despite this happening.

cephadm exited with an error code: 1, stderr:Pulling container image mon... Non-zero exit code 125 from /bin/podman pull mon /bin/podman: stderr Resolving "mon" using unqualified-search registries (/etc/containers/registries.conf) /bin/podman: stderr Trying to pull registry.access.redhat.com/mon:latest... /bin/podman: stderr Trying to pull registry.redhat.io/mon:latest... /bin/podman: stderr Trying to pull docker.io/library/mon:latest... /bin/podman: stderr Error: 3 errors occurred while pulling: /bin/podman: stderr * initializing source docker://registry.access.redhat.com/mon:latest: reading manifest latest in registry.access.redhat.com/mon: name unknown: Repo not found /bin/podman: stderr * initializing source docker://registry.redhat.io/mon:latest: unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication /bin/podman: stderr * initializing source docker://mon:latest: reading manifest latest in docker.io/library/mon: errors: /bin/podman: stderr denied: requested access to the resource is denied /bin/podman: stderr unauthorized: authentication required /bin/podman: stderr ERROR: Failed to pull container image. Check that host(s) are logged into the registry Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1429, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1326, in _run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Pulling container image mon... Non-zero exit code 125 from /bin/podman pull mon /bin/podman: stderr Resolving "mon" using unqualified-search registries (/etc/containers/registries.conf) /bin/podman: stderr Trying to pull registry.access.redhat.com/mon:latest... /bin/podman: stderr Trying to pull registry.redhat.io/mon:latest... /bin/podman: stderr Trying to pull docker.io/library/mon:latest... /bin/podman: stderr Error: 3 errors occurred while pulling: /bin/podman: stderr * initializing source docker://registry.access.redhat.com/mon:latest: reading manifest latest in registry.access.redhat.com/mon: name unknown: Repo not found /bin/podman: stderr * initializing source docker://registry.redhat.io/mon:latest: unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication /bin/podman: stderr * initializing source docker://mon:latest: reading manifest latest in docker.io/library/mon: errors: /bin/podman: stderr denied: requested access to the resource is denied /bin/podman: stderr unauthorized: authentication required /bin/podman: stderr ERROR: Failed to pull container image. Check that host(s) are logged into the registry
Actions #1

Updated by Adam King over 1 year ago

hmm, can I see what "ceph config dump" spits out (feel free to remove anything sensitive if if necessary)? All the images used come from config options iirc so it's likely something there even if the global container_image setting is set correctly.

Actions #2

Updated by Redouane Kachach Elhichou over 1 year ago

  • Status changed from New to Need More Info
Actions #3

Updated by Shawn Iverson over 1 year ago

Here's the dump, there is definitely something fishy here, how do I remove it?

global                          advanced  cluster_network                         10.0.0.0/24                                                                                *
global                          basic     container_image                         quay.io/ceph/ceph@sha256:bdd00e177be2216fe36e605cbd1f32b70c9e3c8285bd66adba005c5e6f71de6b  *
  mon                           advanced  auth_allow_insecure_global_id_reclaim   false
  mon                           advanced  public_network                          10.102.12.0/23                                                                             *
    mon.ceph01san01mon01        basic     container_image                         mon                                                                                        *
  mgr                           advanced  mgr/cephadm/container_init              True                                                                                       *
  mgr                           advanced  mgr/cephadm/migration_current           5                                                                                          *
  mgr                           advanced  mgr/dashboard/ALERTMANAGER_API_HOST     http://host.containers.internal:9093                                                       *
  mgr                           advanced  mgr/dashboard/FEATURE_TOGGLE_ISCSI      false                                                                                      *
  mgr                           advanced  mgr/dashboard/GRAFANA_API_SSL_VERIFY    false                                                                                      *
  mgr                           advanced  mgr/dashboard/GRAFANA_API_URL           https://ceph01san01.dev.ena.net:3100                                                       *
  mgr                           advanced  mgr/dashboard/PROMETHEUS_API_HOST       http://host.containers.internal:9095                                                       *
  mgr                           unknown   mgr/dashboard/redirect_resolve_ip_addr  True                                                                                       *
  mgr                           advanced  mgr/dashboard/ssl                       false                                                                                      *
  mgr                           advanced  mgr/dashboard/ssl_server_port           8443                                                                                       *
  mgr                           advanced  mgr/dashboard/standby_behaviour         error                                                                                      *
  mgr                           advanced  mgr/orchestrator/orchestrator           cephadm
  osd                           advanced  osd_memory_target_autotune              true

Actions #4

Updated by Shawn Iverson over 1 year ago

I executed the following:

ceph config rm mon.ceph01san01mon01 container_image

and now cephadmn is working!

Actions #5

Updated by Adam King over 1 year ago

  • Status changed from Need More Info to Resolved

Shawn Iverson wrote:

I executed the following:

[...]

and now cephadmn is working!

was away the last couple weeks and didn't see your update, but glad it's working now. Was definitely that config option causing the issue. Not sure how that got set like that, but unless we see this more, going to assume it was just a mistaken command somewhere in the past and consider this resolved.

Actions

Also available in: Atom PDF