Project

General

Profile

Actions

Bug #50295

closed

cephadm bootstrap mon container fails to start with podman 3.1 in CentOS 8 Stream

Added by Daniel Reader-Powell about 3 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Category:
cephadm
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When attempting to bootstrap a container on CentOS stream after Appstream changed from podman version 3.0.0-0.33rc2.module_el8.4.0+673+eabfc99d to 3.1.0-0.13.module_el8.5.0+733+9bb5dffa the initial mon container fails to start under systemd due to 'unknown capability "CAP_PERFMON"'

Tested with pacific and octopus builds on a minimal install of CentOS Stream running in KVM VMs.

Workaround was to versionlock podman to 3.0 using python3-dnf-plugin-versionlock on an older release of CentOS Stream prior to applying updates.

[root@ceph01 danrp]# uname -a
Linux ceph01 4.18.0-294.el8.x86_64 #1 SMP Mon Mar 15 22:38:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@ceph01 danrp]# podman version
Version:      3.1.0-dev
API Version:  3.1.0-dev
Go Version:   go1.16.1
Built:        Fri Mar 26 18:32:03 2021
OS/Arch:      linux/amd64
[root@ceph01 danrp]# curl --silent --remote-name --location https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm
[root@ceph01 danrp]# chmod +x cephadm
[root@ceph01 danrp]# ./cephadm bootstrap --mon-ip 10.10.4.14
Creating directory /etc/ceph for ceph.conf
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman|docker (/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 41cdf9dc-9876-11eb-a66f-5254000fb543
Verifying IP 10.10.4.14 port 3300 ...
Verifying IP 10.10.4.14 port 6789 ...
Mon IP 10.10.4.14 is in CIDR network 10.10.4.0/24
- internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Pulling container image docker.io/ceph/ceph:v16...
Ceph version: ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Non-zero exit code 1 from systemctl start ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01
systemctl: stderr Job for ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01.service failed because the control process exited with error code.
systemctl: stderr See "systemctl status ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01.service" and "journalctl -xe" for details.
Traceback (most recent call last):
  File "./cephadm", line 7924, in <module>
    main()
  File "./cephadm", line 7912, in main
    r = ctx.func(ctx)
  File "./cephadm", line 1717, in _default_image
    return func(ctx)
  File "./cephadm", line 3909, in command_bootstrap
    create_mon(ctx, uid, gid, fsid, mon_id)
  File "./cephadm", line 3536, in create_mon
    config=None, keyring=None)
  File "./cephadm", line 2561, in deploy_daemon
    c, osd_fsid=osd_fsid, ports=ports)
  File "./cephadm", line 2757, in deploy_daemon_units
    call_throws(ctx, ['systemctl', 'start', unit_name])
  File "./cephadm", line 1411, in call_throws
    raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: systemctl start ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01
[root@ceph01 danrp]# journalctl -lef -u ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01
-- Logs begin at Thu 2021-04-08 15:21:42 BST. --
Apr 08 15:26:15 ceph01 systemd[1]: Starting Ceph mon.ceph01 for 41cdf9dc-9876-11eb-a66f-5254000fb543...
Apr 08 15:26:16 ceph01 bash[2432]: Error: OCI runtime error: container_linux.go:370: starting container process caused: unknown capability "CAP_PERFMON" 
Apr 08 15:26:16 ceph01 systemd[1]: ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01.service: Control process exited, code=exited status=126
Apr 08 15:26:16 ceph01 systemd[1]: ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01.service: Failed with result 'exit-code'.
Apr 08 15:26:16 ceph01 systemd[1]: Failed to start Ceph mon.ceph01 for 41cdf9dc-9876-11eb-a66f-5254000fb543.

Actions

Also available in: Atom PDF