Project

General

Profile

Actions

Bug #53175

open

podman: failed to exec pid1: Exec format error: wrongly using the amd64-only digest

Added by Lukas Mayer over 2 years ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
container tools
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Introduction

I have an experimental cluster where I experiment with hosts with different Instruction Set Architectures (x86_64, aarch64) in the same cluster.

Basic Information

  • Ceph Pacific == 16.2.6
  • OpenSUSE Tumbleweed
  • Podman 3.4.1
  • Kernel 5.14.14-4.g2b5383f-default

Error description

Deploying daemons on a x86_64 host works, while on an aarch64 hosts, it fails.
The hosts are basically configured identical (same Tumbleweed Snapshot), except having a different instruction set architecture.

This issue occurs on Ceph Pacific v16.2.6, but not on v16.2.5. If trying 'ceph orch upgrade' from 16.2.5 to 16.2.6, the upgrade will work on x86_64, but not on aarch64 hosts.

Error message

The basic problem appears to be, that

2021-11-04 17:30:29,925 DEBUG Running command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=node1b54.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph
2021-11-04 17:30:32,328 DEBUG stat: ERROR (catatonit:7): failed to exec pid1: Exec format error

Some findings

If called from CLI, the above command does indeed output an error message regardless of which host it is run on:

# /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=mon2.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph
stat: cannot stat '%g': No such file or directory
167

I have fixed this issue by adding the necessary quotes around '%u %g'. When called directly from CLI the there is no more error. However the 'ERROR (catatonit:7): failed to exec pid1' remains (on aarch64), if cephadm tries to deploy a crash daemon when calling 'ceph orch host add ...'.

The Cephadm logs on the x86_64 host show, that the command is working with cephadm despite of the quotes:

2021-11-04 23:55:25,667 DEBUG Running command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=mon2.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph
2021-11-04 23:55:26,832 DEBUG stat: 167 167

Full Error message

On the aarch64 host, it shows:

2021-11-04 17:30:29,924 INFO Deploy daemon crash.node1b54 ...
2021-11-04 17:30:29,925 DEBUG Running command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=node1b54.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph
2021-11-04 17:30:32,328 DEBUG stat: ERROR (catatonit:7): failed to exec pid1: Exec format error
2021-11-04 17:30:33,625 INFO Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=node1b54.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph
2021-11-04 17:30:33,625 INFO stat: stderr ERROR (catatonit:7): failed to exec pid1: Exec format error
2021-11-04 17:31:34,140 DEBUG Running command: /usr/bin/podman version --format {{.Client.Version}}
2021-11-04 17:31:34,265 DEBUG /usr/bin/podman: 3.4.1
2021-11-04 17:31:34,339 DEBUG Running command: /usr/bin/podman stats --format {{.ID}},{{.MemUsage}} --no-stream
2021-11-04 17:31:34,536 DEBUG Using specified fsid: 81cd0e70-3d89-11ec-977f-0390d6773ed2
2021-11-04 17:31:34,537 DEBUG Running command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=node1b54.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph
2021-11-04 17:31:36,729 DEBUG stat: ERROR (catatonit:7): failed to exec pid1: Exec format error
2021-11-04 17:31:37,939 INFO Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -e NODE_NAME=node1b54.*** -e CEPH_USE_RANDOM_NONCE=1 quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 -c %u %g /var/lib/ceph
2021-11-04 17:31:37,939 INFO stat: stderr ERROR (catatonit:7): failed to exec pid1: Exec format error
2021-11-04 17:31:38,433 DEBUG Running command: /usr/bin/podman version --format {{.Client.Version}}
2021-11-04 17:31:38,555 DEBUG /usr/bin/podman: 3.4.1
2021-11-04 17:31:38,661 DEBUG Running command: sysctl -a
2021-11-04 17:31:38,725 DEBUG Running command: sysctl -a
2021-11-04 17:31:38,788 DEBUG Running command: sestatus
2021-11-04 17:31:38,795 DEBUG sestatus: SELinux status:                 enabled
2021-11-04 17:31:38,795 DEBUG sestatus: SELinuxfs mount:                /sys/fs/selinux
2021-11-04 17:31:38,796 DEBUG sestatus: SELinux root directory:         /etc/selinux
2021-11-04 17:31:38,796 DEBUG sestatus: Loaded policy name:             targeted
2021-11-04 17:31:38,796 DEBUG sestatus: Current mode:                   enforcing
2021-11-04 17:31:38,796 DEBUG sestatus: Mode from config file:          enforcing
2021-11-04 17:31:38,796 DEBUG sestatus: Policy MLS status:              enabled
2021-11-04 17:31:38,796 DEBUG sestatus: Policy deny_unknown status:     allowed
2021-11-04 17:31:38,796 DEBUG sestatus: Memory protection checking:     requested (insecure)
2021-11-04 17:31:38,797 DEBUG sestatus: Max kernel policy version:      33
2021-11-04 17:31:38,798 DEBUG Running command: sestatus
2021-11-04 17:31:38,805 DEBUG sestatus: SELinux status:                 enabled
2021-11-04 17:31:38,805 DEBUG sestatus: SELinuxfs mount:                /sys/fs/selinux
2021-11-04 17:31:38,805 DEBUG sestatus: SELinux root directory:         /etc/selinux
2021-11-04 17:31:38,805 DEBUG sestatus: Loaded policy name:             targeted
2021-11-04 17:31:38,806 DEBUG sestatus: Current mode:                   enforcing
2021-11-04 17:31:38,806 DEBUG sestatus: Mode from config file:          enforcing
2021-11-04 17:31:38,806 DEBUG sestatus: Policy MLS status:              enabled
2021-11-04 17:31:38,806 DEBUG sestatus: Policy deny_unknown status:     allowed
2021-11-04 17:31:38,806 DEBUG sestatus: Memory protection checking:     requested (insecure)
2021-11-04 17:31:38,806 DEBUG sestatus: Max kernel policy version:      33
2021-11-04 17:31:38,808 DEBUG Running command: sestatus
2021-11-04 17:31:38,815 DEBUG sestatus: SELinux status:                 enabled
2021-11-04 17:31:38,816 DEBUG sestatus: SELinuxfs mount:                /sys/fs/selinux
2021-11-04 17:31:38,816 DEBUG sestatus: SELinux root directory:         /etc/selinux
2021-11-04 17:31:38,816 DEBUG sestatus: Loaded policy name:             targeted
2021-11-04 17:31:38,816 DEBUG sestatus: Current mode:                   enforcing
2021-11-04 17:31:38,816 DEBUG sestatus: Mode from config file:          enforcing
2021-11-04 17:31:38,816 DEBUG sestatus: Policy MLS status:              enabled
2021-11-04 17:31:38,816 DEBUG sestatus: Policy deny_unknown status:     allowed
2021-11-04 17:31:38,817 DEBUG sestatus: Memory protection checking:     requested (insecure)
2021-11-04 17:31:38,817 DEBUG sestatus: Max kernel policy version:      33
2021-11-04 17:31:38,818 DEBUG Running command: sestatus
2021-11-04 17:31:38,825 DEBUG sestatus: SELinux status:                 enabled
2021-11-04 17:31:38,825 DEBUG sestatus: SELinuxfs mount:                /sys/fs/selinux
2021-11-04 17:31:38,825 DEBUG sestatus: SELinux root directory:         /etc/selinux
2021-11-04 17:31:38,825 DEBUG sestatus: Loaded policy name:             targeted
2021-11-04 17:31:38,825 DEBUG sestatus: Current mode:                   enforcing
2021-11-04 17:31:38,826 DEBUG sestatus: Mode from config file:          enforcing
2021-11-04 17:31:38,826 DEBUG sestatus: Policy MLS status:              enabled
2021-11-04 17:31:38,826 DEBUG sestatus: Policy deny_unknown status:     allowed
2021-11-04 17:31:38,826 DEBUG sestatus: Memory protection checking:     requested (insecure)
2021-11-04 17:31:38,826 DEBUG sestatus: Max kernel policy version:      33
2021-11-04 17:31:38,827 DEBUG Running command: sestatus
2021-11-04 17:31:38,834 DEBUG sestatus: SELinux status:                 enabled
2021-11-04 17:31:38,835 DEBUG sestatus: SELinuxfs mount:                /sys/fs/selinux
2021-11-04 17:31:38,835 DEBUG sestatus: SELinux root directory:         /etc/selinux
2021-11-04 17:31:38,835 DEBUG sestatus: Loaded policy name:             targeted
2021-11-04 17:31:38,835 DEBUG sestatus: Current mode:                   enforcing
2021-11-04 17:31:38,835 DEBUG sestatus: Mode from config file:          enforcing
2021-11-04 17:31:38,836 DEBUG sestatus: Policy MLS status:              enabled
2021-11-04 17:31:38,836 DEBUG sestatus: Policy deny_unknown status:     allowed
2021-11-04 17:31:38,836 DEBUG sestatus: Memory protection checking:     requested (insecure)
2021-11-04 17:31:38,836 DEBUG sestatus: Max kernel policy version:      33
2021-11-04 17:31:38,837 DEBUG Running command: sestatus
2021-11-04 17:31:38,844 DEBUG sestatus: SELinux status:                 enabled
2021-11-04 17:31:38,844 DEBUG sestatus: SELinuxfs mount:                /sys/fs/selinux
2021-11-04 17:31:38,845 DEBUG sestatus: SELinux root directory:         /etc/selinux
2021-11-04 17:31:38,845 DEBUG sestatus: Loaded policy name:             targeted
2021-11-04 17:31:38,845 DEBUG sestatus: Current mode:                   enforcing
2021-11-04 17:31:38,845 DEBUG sestatus: Mode from config file:          enforcing
2021-11-04 17:31:38,845 DEBUG sestatus: Policy MLS status:              enabled
2021-11-04 17:31:38,845 DEBUG sestatus: Policy deny_unknown status:     allowed
2021-11-04 17:31:38,845 DEBUG sestatus: Memory protection checking:     requested (insecure)
2021-11-04 17:31:38,846 DEBUG sestatus: Max kernel policy version:      33
2021-11-04 17:31:39,418 DEBUG Running command: /usr/bin/podman version --format {{.Client.Version}}
2021-11-04 17:31:39,543 DEBUG /usr/bin/podman: 3.4.1
2021-11-04 17:31:39,616 DEBUG Acquiring lock 281473443482160 on /run/cephadm/81cd0e70-3d89-11ec-977f-0390d6773ed2.lock
2021-11-04 17:31:39,616 DEBUG Lock 281473443482160 acquired on /run/cephadm/81cd0e70-3d89-11ec-977f-0390d6773ed2.lock
2021-11-04 17:31:39,616 DEBUG Running command: systemctl is-enabled ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2@crash.node1b54
2021-11-04 17:31:39,633 DEBUG systemctl: Failed to get unit file state for ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2@crash.node1b54.service: No such file or directory
2021-11-04 17:31:39,635 DEBUG Running command: systemctl is-active ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2@crash.node1b54
2021-11-04 17:31:39,651 DEBUG systemctl: inactive
2021-11-04 17:31:39,653 DEBUG Running command: /usr/bin/podman container inspect --format {{.State.Status}} ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash-node1b54
2021-11-04 17:31:39,778 DEBUG /usr/bin/podman: Error: error inspecting object: no such container ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash-node1b54
2021-11-04 17:31:39,785 INFO Non-zero exit code 125 from /usr/bin/podman container inspect --format {{.State.Status}} ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash-node1b54
2021-11-04 17:31:39,786 INFO /usr/bin/podman: stderr Error: error inspecting object: no such container ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash-node1b54
2021-11-04 17:31:39,786 DEBUG Running command: /usr/bin/podman container inspect --format {{.State.Status}} ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash.node1b54
2021-11-04 17:31:39,910 DEBUG /usr/bin/podman: Error: error inspecting object: no such container ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash.node1b54
2021-11-04 17:31:39,917 INFO Non-zero exit code 125 from /usr/bin/podman container inspect --format {{.State.Status}} ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash.node1b54
2021-11-04 17:31:39,917 INFO /usr/bin/podman: stderr Error: erro inspecting object: no such container ceph-81cd0e70-3d89-11ec-977f-0390d6773ed2-crash.node1b54

Final thoughts

This is most certainly worth to investigate.

But it is weird enough that identically set up hosts must have started to behave differently somewhere between Ceph v.16.2.5 → v16.2.6.

Looking at git diff v16.2.5 v16.2.6 -- src/cephadm/cephadm I was unable to identify an obvious cause (yet).

If you have further questions or need me to test something in particular, please ask.


Related issues 2 (0 open2 closed)

Related to Orchestrator - Bug #49232: standard_init_linux.go:211: exec user process caused "exec format error"Can't reproduce

Actions
Related to Orchestrator - Bug #44990: cephadm: exec: "/usr/bin/ceph-mon": stat /usr/bin/ceph-mon: no such file or directoryCan't reproduceSebastian Wagner

Actions
Actions #1

Updated by Sebastian Wagner over 2 years ago

  • Subject changed from Cephadm: podman run behaving differently across ( x86_64 | aarch64 ) to podman: failed to exec pid1: Exec format error
Actions #2

Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #49232: standard_init_linux.go:211: exec user process caused "exec format error" added
Actions #3

Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #44990: cephadm: exec: "/usr/bin/ceph-mon": stat /usr/bin/ceph-mon: no such file or directory added
Actions #4

Updated by Sebastian Wagner over 2 years ago

The error related to stat: cannot stat '%g': No such file or directory seems to be a clue here. Would be great if you can look further into this? Does it have an impact if you change the podman version?

Actions #5

Updated by Lukas Mayer over 2 years ago

Because I have come a lot further I just leave the preliminary results of my investigation here:

The Cluster used the image quay.io/ceph/ceph@sha256:285d2abb6e74bdc6e15e1af585aa19f132045651b7a80eb77b9cec8a785ff330 when bootstrapping; which seems to work on amd64, but not on aarch64.

When visiting the Quay.io Web Interface here, I found out that this is the digest of the amd64 version of Ceph v16.2.6.
When bootstrapping a cluster with cephadm --image quay.io/ceph/cephsha@256:5755c3a5c197ef186b8186212e023565f15b799f1ed411207f2c3fcd4a80ab45 bootstrap [...] instead, you can add hosts of either instruction set architecture to the cluster.

This is the explanation why Containers were properly deployed on amd64 but not on aarch64. As amd64 was the architecture that I have run `cephadm bootstrap` on.

cephadm appears to use quay.io/ceph/ceph:v16, but puts the amd64-only digest into the container_image configuration value.

The solution would be to make sure that the platform independent digests are used when deploying daemons.

Actions #6

Updated by Sebastian Wagner over 2 years ago

  • Subject changed from podman: failed to exec pid1: Exec format error to podman: failed to exec pid1: Exec format error: wrongly using the amd64-only digest
Actions #7

Updated by K Jarrett 8 months ago

This issue is still present on v17.2.6. When passing in an image tag, the orchestrator appears to resolve this to a sha256 digest (presumably as the images are periodically re-published under the same tag?). However, the orchestrator will resolve this tag the digest for the orchestrator's architecture.

Attempting to work around this by starting an upgrade for a specific host and passing in the digest (e.g. ceph orch upgrade start --image quay.io/ceph/ceph@sha256:38c99b32650ad1d7503fa67895d911a9992e777509137ad99d4cb69d691ad58e --hosts arm_host) does not work and instead results in the Exec format error being logged on the orchestrator host instead.

It feels like the orchestrator needs to do one of the following for multi-arch Ceph clusters;

  1. Allow the usage of tags rather than digests, so that the container runtime pulls the correct architecture-specific image. However, this may result in slight version differences given that images appear to be periodically re-published using the same tag.
  2. Add logic in to the orchestrator's tag -> digest lookup to consider the target host's architecture and look up the relevant digest.
Actions #8

Updated by K Jarrett 8 months ago

Apologies, option 1 is possible - there's an option within the cephadm manager module to not use digests: ceph config set mgr mgr/cephadm/use_repo_digest false.

Having set this, i'm now able to deploy daemons on to a new arm64 node in an existing amd64 cluster.

Actions

Also available in: Atom PDF