Bug #63118
openceph orch upgrade from 17.2.5 to 18.2.0 on mixed arch (amd64, aarch64) will always try to pull aarch64 images
0%
Description
Hi there,
While trying to upgrade my cluster, I ran `ceph orch upgrade start quay.io/ceph/ceph:v18.2.0` and it seems that as the mgr are runnin on aarch64 nodes, they will always try to use the image for that architecture, even on amd64 nodes.
Eventually status shows as this
```
{
"target_image": "quay.io/ceph/ceph@sha256:51b725a680b725f153741fcc28639b48b020e54131a88b35b35d5f730157b7ba",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [
"rbd-mirror",
"mgr"
],
"progress": "6/36 daemons upgraded",
"message": "Error: UPGRADE_FAILED_PULL: Upgrade: failed to pull target image",
"is_paused": true
}
```
And on the logs I see a few of these lines before it makes the upgrade fail.
```
stat: stderr ERROR (catatonit:2): failed to exec pid1: Exec format error
ERROR: Failed to extract uid/gid for path /var/lib/ceph: Failed command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:51b725a680b725f153741fcc28639b48b020e54131a88b35b35d5f730157b7ba -e NODE_NAME=proliant-1 -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:51b725a680b725f153741fcc28639b48b020e54131a88b35b35d5f730157b7ba -c %u %g /var/lib/ceph: ERROR (catatonit:2): failed to exec pid1: Exec format error
```
I also tried to do `ceph orch redeploy DAEMON --image ...` but it still keeps trying to use the wrong architecture.
I wonder if I should go with the manual route of changing the unit files on /var/lib/ceph, anyone know if it should be safe?
I found an issue similar to this on version 16 but no conclusion, i will search again and link it here.
Updated by Jayson Reis 7 months ago
This is the closest issue I could find but not quite the same error https://tracker.ceph.com/issues/48442
Updated by Jayson Reis 7 months ago
This [1] is pretty much what I am experiencing, so, this could be closed as duplicate.
The solution that works is the same as [2] by not using digests on images.
ceph config set mgr mgr/cephadm/use_repo_digest false --force
[1] https://tracker.ceph.com/issues/53175
[2] https://tracker.ceph.com/issues/53175#note-8