Bug #46429: cephadm fails bootstrap with new Podman Versions 2.0.1 and 2.0.2 - Orchestrator - Ceph

Actions

Copy link

Bug #46429

closed

cephadm fails bootstrap with new Podman Versions 2.0.1 and 2.0.2

Added by Gunther Heinrich almost 4 years ago. Updated over 3 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

cephadm

Target version:

Ceph - v15.2.5

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.4

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Podman had a major new release recently and it seems that cephadm cannot bootstrap a new cluster because of it.
I installed the latest version of Podman from their repositories and followed the basic instructions to create a new Ceph cluster and had that error below. Since a new version of Podman was released one days ago I tried it again with that version but had the same result.

INFO:cephadm:Verifying podman|docker is present...
INFO:cephadm:Verifying lvm2 is present...
INFO:cephadm:Verifying time synchronization is in place...
INFO:cephadm:Unit ntp.service is enabled and running
INFO:cephadm:Repeating the final host check...
INFO:cephadm:podman|docker (/usr/bin/podman) is present
INFO:cephadm:systemctl is present
INFO:cephadm:lvcreate is present
INFO:cephadm:Unit ntp.service is enabled and running
INFO:cephadm:Host looks OK
INFO:root:Cluster fsid: 943951ee-c0db-11ea-b5d7-0dda3f918e2e
INFO:cephadm:Verifying IP 192.168.56.10 port 3300 ...
INFO:cephadm:Verifying IP 192.168.56.10 port 6789 ...
INFO:cephadm:Mon IP 192.168.56.10 is in CIDR network 192.168.56.0/25
INFO:cephadm:Pulling latest docker.io/ceph/ceph:v15 container...
INFO:cephadm:Extracting ceph user uid/gid from container image...
INFO:cephadm:Creating initial keys...
INFO:cephadm:Creating initial monmap...
INFO:cephadm:Creating mon...
INFO:cephadm:Waiting for mon to start...
INFO:cephadm:Waiting for mon...
INFO:cephadm:/usr/bin/ceph:timeout after 30 seconds
INFO:cephadm:Non-zero exit code -9 from /usr/bin/podman run --rm --net=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=cephtest -v /var/lib/ceph/943951ee-c0db-11ea-b5d7-0dda3f918e2e/mon.cephtest:/var/lib/ceph/mon/ceph-cephtest:z -v /tmp/ceph-tmpsnxe3x02:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpnfxlyp0o:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph docker.io/ceph/ceph:v15 status
INFO:cephadm:mon not available, waiting (1/10)...
INFO:cephadm:/usr/bin/ceph:timeout after 30 seconds
INFO:cephadm:Non-zero exit code -9 from /usr/bin/podman run --rm --net=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=cephtest -v /var/lib/ceph/943951ee-c0db-11ea-b5d7-0dda3f918e2e/mon.cephtest:/var/lib/ceph/mon/ceph-cephtest:z -v /tmp/ceph-tmpsnxe3x02:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpnfxlyp0o:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph docker.io/ceph/ceph:v15 status
INFO:cephadm:mon not available, waiting (2/10)...
KeyboardInterrupt
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 4282, in <module>
    r = args.func()
  File "/usr/sbin/cephadm", line 972, in _default_image
    return func()
  File "/usr/sbin/cephadm", line 2266, in command_bootstrap
    is_available('mon', is_mon_available)
  File "/usr/sbin/cephadm", line 757, in is_available
    if func():
  File "/usr/sbin/cephadm", line 2262, in is_mon_available
    out, err, ret = call(c.run_cmd(),
  File "/usr/sbin/cephadm", line 630, in call
    reads, _, _ = select.select(

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Gunther Heinrich almost 4 years ago

Update:
I did some additional tests and after some digging I found something in /var/log/syslog:

cephtest systemd[1]: Starting Ceph mon.cephtest for cc7626d4-c597-11ea-8574-2172e5035b09...
cephtest podman[4372]: Error: no container with name or ID ceph-cc7626d4-c597-11ea-8574-2172e5035b09-mon.cephtest found: no such container
cephtest systemd[1]: Started Ceph mon.cephtest for cc7626d4-c597-11ea-8574-2172e5035b09.
cephtest systemd[1]: Started libpod-conmon-893fd12312645a59087aa96cfa419bc7b10cdfc8da0ea1cd7c13099c46313dff.scope.
cephtest systemd[707]: run-runc-893fd12312645a59087aa96cfa419bc7b10cdfc8da0ea1cd7c13099c46313dff-runc.Xgkjiy.mount: Succeeded.
cephtest systemd[1]: run-runc-893fd12312645a59087aa96cfa419bc7b10cdfc8da0ea1cd7c13099c46313dff-runc.Xgkjiy.mount: Succeeded.
cephtest systemd[1]: Started libcontainer container 893fd12312645a59087aa96cfa419bc7b10cdfc8da0ea1cd7c13099c46313dff.
cephtest bash[4425]: Error: invalid config provided: AppArmorProfile and privileged are mutually exclusive options

It seems that the options provided during bootstrapping seem to conflict with each other since Podman v2. The podman developers are already aware of this: https://github.com/containers/podman/issues/6933
Although I didn't test it further it seems that updating Podman to its latest version on an already running Cluster might break it because the nodes on which I updated Podman didn't fully rejoin the cluster after a reboot.
All tests were done using Ubuntu Server 20.04

Actions

Copy link

Updated by David Orman almost 4 years ago

G. Heinrich wrote:

Update:
I did some additional tests and after some digging I found something in /var/log/syslog:
[...]
It seems that the options provided during bootstrapping seem to conflict with each other since Podman v2. The podman developers are already aware of this: https://github.com/containers/podman/issues/6933
Although I didn't test it further it seems that updating Podman to its latest version on an already running Cluster might break it because the nodes on which I updated Podman didn't fully rejoin the cluster after a reboot.
All tests were done using Ubuntu Server 20.04

While the podman developers are aware of this issue, I am not sure what their intended fix is - they may not revert to the old behavior. There are some suggestions in that bug report that can be implemented on the Ceph side, as well, to resolve this issue. I would argue that this is relatively high-priority, as packaged versions of Podman for v1.9.3 can be harder to come by, and new users will likely be on the latest from the repos and experience a lot of trouble that's not obvious unless you've got experience with troubleshooting containers.

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

let's wait, till https://github.com/containers/podman/issues/6933 is resolved.

Actions

Copy link

Updated by Gunther Heinrich almost 4 years ago

Some days ago a new version of Podman was released (2.0.3). I installed the latests version virtually and after some tests it seems that the bug/problem was fixed.
I updated Podman on all physical cluster nodes three days ago and the cluster (15.2.4) continued to work normally.

If someone else can confirm this we could close this issue.

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

Related to Bug #46206: cephadm: podman 2.0 added

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

Status changed from New to Closed

Actions

Copy link

Updated by Joshua Schmid over 3 years ago

Tested

dev-box-1:/var/lib/ceph/662190fd-30cc-4f4f-945d-dd429e32b0c0 # podman --version
podman version 2.0.6

and can confirm that this very issue has been resolved.

However I'm seeing now:

Error: invalid config provided: CapAdd and privileged are mutually exclusive options

Which is caused by:

https://github.com/ceph/ceph/blob/75d9369a021458589a3a73c2c141a721ba439aff/src/cephadm/cephadm#L2342

Confirmed it by commenting it out.

Actions

Copy link

Updated by Sage Weil over 3 years ago

Current cephadm avoids combining --cap-add and --privileged, but older cephadm does not, and some distros still have the old podman. Work around this by leaving off allow_ptrace=true for upgrade tests. See https://github.com/ceph/ceph/pull/38974

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Orchestrator

Custom queries

Bug #46429

cephadm fails bootstrap with new Podman Versions 2.0.1 and 2.0.2

Updated by Gunther Heinrich almost 4 years ago

Updated by David Orman almost 4 years ago

Updated by Sebastian Wagner almost 4 years ago

Updated by Gunther Heinrich almost 4 years ago

Updated by Sebastian Wagner almost 4 years ago

Updated by Sebastian Wagner almost 4 years ago

Updated by Joshua Schmid over 3 years ago

Updated by Sage Weil over 3 years ago