Bug #53175
openpodman: failed to exec pid1: Exec format error: wrongly using the amd64-only digest
0%
Description
Introduction¶
I have an experimental cluster where I experiment with hosts with different Instruction Set Architectures (x86_64, aarch64) in the same cluster.
Basic Information¶
- Ceph Pacific == 16.2.6
- OpenSUSE Tumbleweed
- Podman 3.4.1
- Kernel 5.14.14-4.g2b5383f-default
Error description¶
Deploying daemons on a x86_64 host works, while on an aarch64 hosts, it fails.
The hosts are basically configured identical (same Tumbleweed Snapshot), except having a different instruction set architecture.
This issue occurs on Ceph Pacific v16.2.6, but not on v16.2.5. If trying 'ceph orch upgrade' from 16.2.5 to 16.2.6, the upgrade will work on x86_64, but not on aarch64 hosts.
Error message¶
The basic problem appears to be, that
2021-11-04 17:30:29,925 DEBUG Running command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=node1b54.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph 2021-11-04 17:30:32,328 DEBUG stat: ERROR (catatonit:7): failed to exec pid1: Exec format error
Some findings¶
If called from CLI, the above command does indeed output an error message regardless of which host it is run on:
# /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=mon2.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph stat: cannot stat '%g': No such file or directory 167
I have fixed this issue by adding the necessary quotes around '%u %g'. When called directly from CLI the there is no more error. However the 'ERROR (catatonit:7): failed to exec pid1' remains (on aarch64), if cephadm tries to deploy a crash daemon when calling 'ceph orch host add ...'.
The Cephadm logs on the x86_64 host show, that the command is working with cephadm despite of the quotes:
2021-11-04 23:55:25,667 DEBUG Running command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=mon2.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph 2021-11-04 23:55:26,832 DEBUG stat: 167 167
Full Error message¶
On the aarch64 host, it shows:
2021-11-04 17:30:29,924 INFO Deploy daemon crash.node1b54 ... 2021-11-04 17:30:29,925 DEBUG Running command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=node1b54.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph 2021-11-04 17:30:32,328 DEBUG stat: ERROR (catatonit:7): failed to exec pid1: Exec format error 2021-11-04 17:30:33,625 INFO Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=node1b54.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph 2021-11-04 17:30:33,625 INFO stat: stderr ERROR (catatonit:7): failed to exec pid1: Exec format error 2021-11-04 17:31:34,140 DEBUG Running command: /usr/bin/podman version --format {{.Client.Version}} 2021-11-04 17:31:34,265 DEBUG /usr/bin/podman: 3.4.1 2021-11-04 17:31:34,339 DEBUG Running command: /usr/bin/podman stats --format {{.ID}},{{.MemUsage}} --no-stream 2021-11-04 17:31:34,536 DEBUG Using specified fsid: 81cd0e70-3d89-11ec-977f-0390d6773ed2 2021-11-04 17:31:34,537 DEBUG Running command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=node1b54.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph 2021-11-04 17:31:36,729 DEBUG stat: ERROR (catatonit:7): failed to exec pid1: Exec format error 2021-11-04 17:31:37,939 INFO Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=node1b54.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph 2021-11-04 17:31:37,939 INFO stat: stderr ERROR (catatonit:7): failed to exec pid1: Exec format error 2021-11-04 17:31:38,433 DEBUG Running command: /usr/bin/podman version --format {{.Client.Version}} 2021-11-04 17:31:38,555 DEBUG /usr/bin/podman: 3.4.1 2021-11-04 17:31:38,661 DEBUG Running command: sysctl -a 2021-11-04 17:31:38,725 DEBUG Running command: sysctl -a 2021-11-04 17:31:38,788 DEBUG Running command: sestatus 2021-11-04 17:31:38,795 DEBUG sestatus: SELinux status: enabled 2021-11-04 17:31:38,795 DEBUG sestatus: SELinuxfs mount: /sys/fs/selinux 2021-11-04 17:31:38,796 DEBUG sestatus: SELinux root directory: /etc/selinux 2021-11-04 17:31:38,796 DEBUG sestatus: Loaded policy name: targeted 2021-11-04 17:31:38,796 DEBUG sestatus: Current mode: enforcing 2021-11-04 17:31:38,796 DEBUG sestatus: Mode from config file: enforcing 2021-11-04 17:31:38,796 DEBUG sestatus: Policy MLS status: enabled 2021-11-04 17:31:38,796 DEBUG sestatus: Policy deny_unknown status: allowed 2021-11-04 17:31:38,796 DEBUG sestatus: Memory protection checking: requested (insecure) 2021-11-04 17:31:38,797 DEBUG sestatus: Max kernel policy version: 33 2021-11-04 17:31:38,798 DEBUG Running command: sestatus 2021-11-04 17:31:38,805 DEBUG sestatus: SELinux status: enabled 2021-11-04 17:31:38,805 DEBUG sestatus: SELinuxfs mount: /sys/fs/selinux 2021-11-04 17:31:38,805 DEBUG sestatus: SELinux root directory: /etc/selinux 2021-11-04 17:31:38,805 DEBUG sestatus: Loaded policy name: targeted 2021-11-04 17:31:38,806 DEBUG sestatus: Current mode: enforcing 2021-11-04 17:31:38,806 DEBUG sestatus: Mode from config file: enforcing 2021-11-04 17:31:38,806 DEBUG sestatus: Policy MLS status: enabled 2021-11-04 17:31:38,806 DEBUG sestatus: Policy deny_unknown status: allowed 2021-11-04 17:31:38,806 DEBUG sestatus: Memory protection checking: requested (insecure) 2021-11-04 17:31:38,806 DEBUG sestatus: Max kernel policy version: 33 2021-11-04 17:31:38,808 DEBUG Running command: sestatus 2021-11-04 17:31:38,815 DEBUG sestatus: SELinux status: enabled 2021-11-04 17:31:38,816 DEBUG sestatus: SELinuxfs mount: /sys/fs/selinux 2021-11-04 17:31:38,816 DEBUG sestatus: SELinux root directory: /etc/selinux 2021-11-04 17:31:38,816 DEBUG sestatus: Loaded policy name: targeted 2021-11-04 17:31:38,816 DEBUG sestatus: Current mode: enforcing 2021-11-04 17:31:38,816 DEBUG sestatus: Mode from config file: enforcing 2021-11-04 17:31:38,816 DEBUG sestatus: Policy MLS status: enabled 2021-11-04 17:31:38,816 DEBUG sestatus: Policy deny_unknown status: allowed 2021-11-04 17:31:38,817 DEBUG sestatus: Memory protection checking: requested (insecure) 2021-11-04 17:31:38,817 DEBUG sestatus: Max kernel policy version: 33 2021-11-04 17:31:38,818 DEBUG Running command: sestatus 2021-11-04 17:31:38,825 DEBUG sestatus: SELinux status: enabled 2021-11-04 17:31:38,825 DEBUG sestatus: SELinuxfs mount: /sys/fs/selinux 2021-11-04 17:31:38,825 DEBUG sestatus: SELinux root directory: /etc/selinux 2021-11-04 17:31:38,825 DEBUG sestatus: Loaded policy name: targeted 2021-11-04 17:31:38,825 DEBUG sestatus: Current mode: enforcing 2021-11-04 17:31:38,826 DEBUG sestatus: Mode from config file: enforcing 2021-11-04 17:31:38,826 DEBUG sestatus: Policy MLS status: enabled 2021-11-04 17:31:38,826 DEBUG sestatus: Policy deny_unknown status: allowed 2021-11-04 17:31:38,826 DEBUG sestatus: Memory protection checking: requested (insecure) 2021-11-04 17:31:38,826 DEBUG sestatus: Max kernel policy version: 33 2021-11-04 17:31:38,827 DEBUG Running command: sestatus 2021-11-04 17:31:38,834 DEBUG sestatus: SELinux status: enabled 2021-11-04 17:31:38,835 DEBUG sestatus: SELinuxfs mount: /sys/fs/selinux 2021-11-04 17:31:38,835 DEBUG sestatus: SELinux root directory: /etc/selinux 2021-11-04 17:31:38,835 DEBUG sestatus: Loaded policy name: targeted 2021-11-04 17:31:38,835 DEBUG sestatus: Current mode: enforcing 2021-11-04 17:31:38,835 DEBUG sestatus: Mode from config file: enforcing 2021-11-04 17:31:38,836 DEBUG sestatus: Policy MLS status: enabled 2021-11-04 17:31:38,836 DEBUG sestatus: Policy deny_unknown status: allowed 2021-11-04 17:31:38,836 DEBUG sestatus: Memory protection checking: requested (insecure) 2021-11-04 17:31:38,836 DEBUG sestatus: Max kernel policy version: 33 2021-11-04 17:31:38,837 DEBUG Running command: sestatus 2021-11-04 17:31:38,844 DEBUG sestatus: SELinux status: enabled 2021-11-04 17:31:38,844 DEBUG sestatus: SELinuxfs mount: /sys/fs/selinux 2021-11-04 17:31:38,845 DEBUG sestatus: SELinux root directory: /etc/selinux 2021-11-04 17:31:38,845 DEBUG sestatus: Loaded policy name: targeted 2021-11-04 17:31:38,845 DEBUG sestatus: Current mode: enforcing 2021-11-04 17:31:38,845 DEBUG sestatus: Mode from config file: enforcing 2021-11-04 17:31:38,845 DEBUG sestatus: Policy MLS status: enabled 2021-11-04 17:31:38,845 DEBUG sestatus: Policy deny_unknown status: allowed 2021-11-04 17:31:38,845 DEBUG sestatus: Memory protection checking: requested (insecure) 2021-11-04 17:31:38,846 DEBUG sestatus: Max kernel policy version: 33 2021-11-04 17:31:39,418 DEBUG Running command: /usr/bin/podman version --format {{.Client.Version}} 2021-11-04 17:31:39,543 DEBUG /usr/bin/podman: 3.4.1 2021-11-04 17:31:39,616 DEBUG Acquiring lock 281473443482160 on /run/cephadm/81cd0e70-3d89-11ec-977f-0390d6773ed2.lock 2021-11-04 17:31:39,616 DEBUG Lock 281473443482160 acquired on /run/cephadm/81cd0e70-3d89-11ec-977f-0390d6773ed2.lock 2021-11-04 17:31:39,616 DEBUG Running command: systemctl is-enabled ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2@crash.node1b54 2021-11-04 17:31:39,633 DEBUG systemctl: Failed to get unit file state for ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2@crash.node1b54.service: No such file or directory 2021-11-04 17:31:39,635 DEBUG Running command: systemctl is-active ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2@crash.node1b54 2021-11-04 17:31:39,651 DEBUG systemctl: inactive 2021-11-04 17:31:39,653 DEBUG Running command: /usr/bin/podman container inspect --format {{.State.Status}} ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash-node1b54 2021-11-04 17:31:39,778 DEBUG /usr/bin/podman: Error: error inspecting object: no such container ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash-node1b54 2021-11-04 17:31:39,785 INFO Non-zero exit code 125 from /usr/bin/podman container inspect --format {{.State.Status}} ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash-node1b54 2021-11-04 17:31:39,786 INFO /usr/bin/podman: stderr Error: error inspecting object: no such container ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash-node1b54 2021-11-04 17:31:39,786 DEBUG Running command: /usr/bin/podman container inspect --format {{.State.Status}} ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash.node1b54 2021-11-04 17:31:39,910 DEBUG /usr/bin/podman: Error: error inspecting object: no such container ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash.node1b54 2021-11-04 17:31:39,917 INFO Non-zero exit code 125 from /usr/bin/podman container inspect --format {{.State.Status}} ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash.node1b54 2021-11-04 17:31:39,917 INFO /usr/bin/podman: stderr Error: erro inspecting object: no such container ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash.node1b54
Final thoughts¶
This is most certainly worth to investigate.
But it is weird enough that identically set up hosts must have started to behave differently somewhere between Ceph v.16.2.5 → v16.2.6.
Looking at git diff v16.2.5 v16.2.6 -- src/cephadm/cephadm
I was unable to identify an obvious cause (yet).
If you have further questions or need me to test something in particular, please ask.