Bug #46429
closedcephadm fails bootstrap with new Podman Versions 2.0.1 and 2.0.2
0%
Description
Podman had a major new release recently and it seems that cephadm cannot bootstrap a new cluster because of it.
I installed the latest version of Podman from their repositories and followed the basic instructions to create a new Ceph cluster and had that error below. Since a new version of Podman was released one days ago I tried it again with that version but had the same result.
INFO:cephadm:Verifying podman|docker is present...
INFO:cephadm:Verifying lvm2 is present...
INFO:cephadm:Verifying time synchronization is in place...
INFO:cephadm:Unit ntp.service is enabled and running
INFO:cephadm:Repeating the final host check...
INFO:cephadm:podman|docker (/usr/bin/podman) is present
INFO:cephadm:systemctl is present
INFO:cephadm:lvcreate is present
INFO:cephadm:Unit ntp.service is enabled and running
INFO:cephadm:Host looks OK
INFO:root:Cluster fsid: 943951ee-c0db-11ea-b5d7-0dda3f918e2e
INFO:cephadm:Verifying IP 192.168.56.10 port 3300 ...
INFO:cephadm:Verifying IP 192.168.56.10 port 6789 ...
INFO:cephadm:Mon IP 192.168.56.10 is in CIDR network 192.168.56.0/25
INFO:cephadm:Pulling latest docker.io/ceph/ceph:v15 container...
INFO:cephadm:Extracting ceph user uid/gid from container image...
INFO:cephadm:Creating initial keys...
INFO:cephadm:Creating initial monmap...
INFO:cephadm:Creating mon...
INFO:cephadm:Waiting for mon to start...
INFO:cephadm:Waiting for mon...
INFO:cephadm:/usr/bin/ceph:timeout after 30 seconds
INFO:cephadm:Non-zero exit code -9 from /usr/bin/podman run --rm --net=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=cephtest -v /var/lib/ceph/943951ee-c0db-11ea-b5d7-0dda3f918e2e/mon.cephtest:/var/lib/ceph/mon/ceph-cephtest:z -v /tmp/ceph-tmpsnxe3x02:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpnfxlyp0o:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph docker.io/ceph/ceph:v15 status
INFO:cephadm:mon not available, waiting (1/10)...
INFO:cephadm:/usr/bin/ceph:timeout after 30 seconds
INFO:cephadm:Non-zero exit code -9 from /usr/bin/podman run --rm --net=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=cephtest -v /var/lib/ceph/943951ee-c0db-11ea-b5d7-0dda3f918e2e/mon.cephtest:/var/lib/ceph/mon/ceph-cephtest:z -v /tmp/ceph-tmpsnxe3x02:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpnfxlyp0o:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph docker.io/ceph/ceph:v15 status
INFO:cephadm:mon not available, waiting (2/10)...
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/sbin/cephadm", line 4282, in <module>
r = args.func()
File "/usr/sbin/cephadm", line 972, in _default_image
return func()
File "/usr/sbin/cephadm", line 2266, in command_bootstrap
is_available('mon', is_mon_available)
File "/usr/sbin/cephadm", line 757, in is_available
if func():
File "/usr/sbin/cephadm", line 2262, in is_mon_available
out, err, ret = call(c.run_cmd(),
File "/usr/sbin/cephadm", line 630, in call
reads, _, _ = select.select(
Updated by Gunther Heinrich almost 4 years ago
Update:
I did some additional tests and after some digging I found something in /var/log/syslog:
cephtest systemd[1]: Starting Ceph mon.cephtest for cc7626d4-c597-11ea-8574-2172e5035b09...
cephtest podman[4372]: Error: no container with name or ID ceph-cc7626d4-c597-11ea-8574-2172e5035b09-mon.cephtest found: no such container
cephtest systemd[1]: Started Ceph mon.cephtest for cc7626d4-c597-11ea-8574-2172e5035b09.
cephtest systemd[1]: Started libpod-conmon-893fd12312645a59087aa96cfa419bc7b10cdfc8da0ea1cd7c13099c46313dff.scope.
cephtest systemd[707]: run-runc-893fd12312645a59087aa96cfa419bc7b10cdfc8da0ea1cd7c13099c46313dff-runc.Xgkjiy.mount: Succeeded.
cephtest systemd[1]: run-runc-893fd12312645a59087aa96cfa419bc7b10cdfc8da0ea1cd7c13099c46313dff-runc.Xgkjiy.mount: Succeeded.
cephtest systemd[1]: Started libcontainer container 893fd12312645a59087aa96cfa419bc7b10cdfc8da0ea1cd7c13099c46313dff.
cephtest bash[4425]: Error: invalid config provided: AppArmorProfile and privileged are mutually exclusive options
It seems that the options provided during bootstrapping seem to conflict with each other since Podman v2. The podman developers are already aware of this: https://github.com/containers/podman/issues/6933
Although I didn't test it further it seems that updating Podman to its latest version on an already running Cluster might break it because the nodes on which I updated Podman didn't fully rejoin the cluster after a reboot.
All tests were done using Ubuntu Server 20.04
Updated by David Orman almost 4 years ago
G. Heinrich wrote:
Update:
I did some additional tests and after some digging I found something in /var/log/syslog:
[...]
It seems that the options provided during bootstrapping seem to conflict with each other since Podman v2. The podman developers are already aware of this: https://github.com/containers/podman/issues/6933
Although I didn't test it further it seems that updating Podman to its latest version on an already running Cluster might break it because the nodes on which I updated Podman didn't fully rejoin the cluster after a reboot.
All tests were done using Ubuntu Server 20.04
While the podman developers are aware of this issue, I am not sure what their intended fix is - they may not revert to the old behavior. There are some suggestions in that bug report that can be implemented on the Ceph side, as well, to resolve this issue. I would argue that this is relatively high-priority, as packaged versions of Podman for v1.9.3 can be harder to come by, and new users will likely be on the latest from the repos and experience a lot of trouble that's not obvious unless you've got experience with troubleshooting containers.
Updated by Sebastian Wagner almost 4 years ago
let's wait, till https://github.com/containers/podman/issues/6933 is resolved.
Updated by Gunther Heinrich almost 4 years ago
Some days ago a new version of Podman was released (2.0.3). I installed the latests version virtually and after some tests it seems that the bug/problem was fixed.
I updated Podman on all physical cluster nodes three days ago and the cluster (15.2.4) continued to work normally.
If someone else can confirm this we could close this issue.
Updated by Sebastian Wagner almost 4 years ago
- Related to Bug #46206: cephadm: podman 2.0 added
Updated by Sebastian Wagner almost 4 years ago
- Status changed from New to Closed
Updated by Joshua Schmid over 3 years ago
Tested
dev-box-1:/var/lib/ceph/662190fd-30cc-4f4f-945d-dd429e32b0c0 # podman --version
podman version 2.0.6
and can confirm that this very issue has been resolved.
However I'm seeing now:
Error: invalid config provided: CapAdd and privileged are mutually exclusive options
Which is caused by:
https://github.com/ceph/ceph/blob/75d9369a021458589a3a73c2c141a721ba439aff/src/cephadm/cephadm#L2342
Confirmed it by commenting it out.
Updated by Sage Weil over 3 years ago
Current cephadm avoids combining --cap-add and --privileged, but older cephadm does not, and some distros still have the old podman. Work around this by leaving off allow_ptrace=true for upgrade tests. See https://github.com/ceph/ceph/pull/38974