Project

General

Profile

Actions

Bug #63118

open

ceph orch upgrade from 17.2.5 to 18.2.0 on mixed arch (amd64, aarch64) will always try to pull aarch64 images

Added by Jayson Reis 7 months ago. Updated 7 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
orchestrator
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi there,
While trying to upgrade my cluster, I ran `ceph orch upgrade start quay.io/ceph/ceph:v18.2.0` and it seems that as the mgr are runnin on aarch64 nodes, they will always try to use the image for that architecture, even on amd64 nodes.
Eventually status shows as this
``` {
"target_image": "quay.io/ceph/ceph@sha256:51b725a680b725f153741fcc28639b48b020e54131a88b35b35d5f730157b7ba",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [
"rbd-mirror",
"mgr"
],
"progress": "6/36 daemons upgraded",
"message": "Error: UPGRADE_FAILED_PULL: Upgrade: failed to pull target image",
"is_paused": true
}
```

And on the logs I see a few of these lines before it makes the upgrade fail.

```
stat: stderr ERROR (catatonit:2): failed to exec pid1: Exec format error
ERROR: Failed to extract uid/gid for path /var/lib/ceph: Failed command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:51b725a680b725f153741fcc28639b48b020e54131a88b35b35d5f730157b7ba -e NODE_NAME=proliant-1 -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:51b725a680b725f153741fcc28639b48b020e54131a88b35b35d5f730157b7ba -c %u %g /var/lib/ceph: ERROR (catatonit:2): failed to exec pid1: Exec format error
```

I also tried to do `ceph orch redeploy DAEMON --image ...` but it still keeps trying to use the wrong architecture.
I wonder if I should go with the manual route of changing the unit files on /var/lib/ceph, anyone know if it should be safe?
I found an issue similar to this on version 16 but no conclusion, i will search again and link it here.

Actions #1

Updated by Jayson Reis 7 months ago

This is the closest issue I could find but not quite the same error https://tracker.ceph.com/issues/48442

Actions #2

Updated by Jayson Reis 7 months ago

This [1] is pretty much what I am experiencing, so, this could be closed as duplicate.
The solution that works is the same as [2] by not using digests on images.

ceph config set mgr mgr/cephadm/use_repo_digest false --force

[1] https://tracker.ceph.com/issues/53175
[2] https://tracker.ceph.com/issues/53175#note-8

Actions

Also available in: Atom PDF