Project

General

Profile

Actions

Bug #63118

open

ceph orch upgrade from 17.2.5 to 18.2.0 on mixed arch (amd64, aarch64) will always try to pull aarch64 images

Added by Jayson Reis 8 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
orchestrator
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi there,
While trying to upgrade my cluster, I ran `ceph orch upgrade start quay.io/ceph/ceph:v18.2.0` and it seems that as the mgr are runnin on aarch64 nodes, they will always try to use the image for that architecture, even on amd64 nodes.
Eventually status shows as this
``` {
"target_image": "quay.io/ceph/ceph@sha256:51b725a680b725f153741fcc28639b48b020e54131a88b35b35d5f730157b7ba",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [
"rbd-mirror",
"mgr"
],
"progress": "6/36 daemons upgraded",
"message": "Error: UPGRADE_FAILED_PULL: Upgrade: failed to pull target image",
"is_paused": true
}
```

And on the logs I see a few of these lines before it makes the upgrade fail.

```
stat: stderr ERROR (catatonit:2): failed to exec pid1: Exec format error
ERROR: Failed to extract uid/gid for path /var/lib/ceph: Failed command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:51b725a680b725f153741fcc28639b48b020e54131a88b35b35d5f730157b7ba -e NODE_NAME=proliant-1 -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:51b725a680b725f153741fcc28639b48b020e54131a88b35b35d5f730157b7ba -c %u %g /var/lib/ceph: ERROR (catatonit:2): failed to exec pid1: Exec format error
```

I also tried to do `ceph orch redeploy DAEMON --image ...` but it still keeps trying to use the wrong architecture.
I wonder if I should go with the manual route of changing the unit files on /var/lib/ceph, anyone know if it should be safe?
I found an issue similar to this on version 16 but no conclusion, i will search again and link it here.

Actions

Also available in: Atom PDF