Bug #50295: cephadm bootstrap mon container fails to start with podman 3.1 in CentOS 8 Stream - Orchestrator - Ceph

Actions

Copy link

Bug #50295

closed

cephadm bootstrap mon container fails to start with podman 3.1 in CentOS 8 Stream

Added by Daniel Reader-Powell about 3 years ago. Updated about 2 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Redouane Kachach Elhichou

Category:

cephadm

Target version:

Ceph - v16.2.0

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v16.2.0

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

When attempting to bootstrap a container on CentOS stream after Appstream changed from podman version 3.0.0-0.33rc2.module_el8.4.0+673+eabfc99d to 3.1.0-0.13.module_el8.5.0+733+9bb5dffa the initial mon container fails to start under systemd due to 'unknown capability "CAP_PERFMON"'

Tested with pacific and octopus builds on a minimal install of CentOS Stream running in KVM VMs.

Workaround was to versionlock podman to 3.0 using python3-dnf-plugin-versionlock on an older release of CentOS Stream prior to applying updates.

[root@ceph01 danrp]# uname -a
Linux ceph01 4.18.0-294.el8.x86_64 #1 SMP Mon Mar 15 22:38:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@ceph01 danrp]# podman version
Version:      3.1.0-dev
API Version:  3.1.0-dev
Go Version:   go1.16.1
Built:        Fri Mar 26 18:32:03 2021
OS/Arch:      linux/amd64
[root@ceph01 danrp]# curl --silent --remote-name --location https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm
[root@ceph01 danrp]# chmod +x cephadm
[root@ceph01 danrp]# ./cephadm bootstrap --mon-ip 10.10.4.14
Creating directory /etc/ceph for ceph.conf
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman|docker (/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 41cdf9dc-9876-11eb-a66f-5254000fb543
Verifying IP 10.10.4.14 port 3300 ...
Verifying IP 10.10.4.14 port 6789 ...
Mon IP 10.10.4.14 is in CIDR network 10.10.4.0/24
- internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Pulling container image docker.io/ceph/ceph:v16...
Ceph version: ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Non-zero exit code 1 from systemctl start ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01
systemctl: stderr Job for ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01.service failed because the control process exited with error code.
systemctl: stderr See "systemctl status ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01.service" and "journalctl -xe" for details.
Traceback (most recent call last):
  File "./cephadm", line 7924, in <module>
    main()
  File "./cephadm", line 7912, in main
    r = ctx.func(ctx)
  File "./cephadm", line 1717, in _default_image
    return func(ctx)
  File "./cephadm", line 3909, in command_bootstrap
    create_mon(ctx, uid, gid, fsid, mon_id)
  File "./cephadm", line 3536, in create_mon
    config=None, keyring=None)
  File "./cephadm", line 2561, in deploy_daemon
    c, osd_fsid=osd_fsid, ports=ports)
  File "./cephadm", line 2757, in deploy_daemon_units
    call_throws(ctx, ['systemctl', 'start', unit_name])
  File "./cephadm", line 1411, in call_throws
    raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: systemctl start ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01
[root@ceph01 danrp]# journalctl -lef -u ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01
-- Logs begin at Thu 2021-04-08 15:21:42 BST. --
Apr 08 15:26:15 ceph01 systemd[1]: Starting Ceph mon.ceph01 for 41cdf9dc-9876-11eb-a66f-5254000fb543...
Apr 08 15:26:16 ceph01 bash[2432]: Error: OCI runtime error: container_linux.go:370: starting container process caused: unknown capability "CAP_PERFMON" 
Apr 08 15:26:16 ceph01 systemd[1]: ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01.service: Control process exited, code=exited status=126
Apr 08 15:26:16 ceph01 systemd[1]: ceph-41cdf9dc-9876-11eb-a66f-5254000fb543@mon.ceph01.service: Failed with result 'exit-code'.
Apr 08 15:26:16 ceph01 systemd[1]: Failed to start Ceph mon.ceph01 for 41cdf9dc-9876-11eb-a66f-5254000fb543.

Actions

Copy link

Updated by Dan van der Ster almost 3 years ago

~~Same here. Did you find a solution?~~

I see, https://bugzilla.redhat.com/show_bug.cgi?id=1946982

Actions

Copy link

Updated by Redouane Kachach Elhichou about 2 years ago

Assignee set to Redouane Kachach Elhichou

Actions

Copy link

Updated by Redouane Kachach Elhichou about 2 years ago

I installed a fresh Centos 8 stream and then podman (the current version installed is 4.0). Then I launched the bootstrap process which worked without issues:

[root@centos8stream ~]# podman version
Client:       Podman Engine
Version:      4.0.2
API Version:  4.0.2
Go Version:   go1.17.7

Built:      Tue Mar 15 19:15:06 2022
OS/Arch:    linux/amd64

[root@centos8stream ~]# cat /etc/centos-release
CentOS Stream release 8
[root@centos8stream ~]#
[root@centos8stream ~]# ./cephadm shell
Inferring fsid 23e35a50-ae89-11ec-a78f-52540034e3cf
Using recent ceph image quay.io/ceph/ceph@sha256:0d927ccbd8892180ee09894c2b2c26d07c938bf96a56eaee9b80fc9f26083ddb
[ceph: root@centos8stream /]#

[ceph: root@centos8stream /]# ceph orch ps 
NAME                         HOST           PORTS        STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
alertmanager.centos8stream   centos8stream  *:9093,9094  running (5m)     4m ago   6m    15.6M        -  0.20.0   0881eb8f169f  309bbf3f4e78  
crash.centos8stream          centos8stream               running (6m)     4m ago   6m    8292k        -  16.2.7   c92aec2cd894  a63c5a299c1e  
grafana.centos8stream        centos8stream  *:3000       running (5m)     4m ago   6m    29.6M        -  6.7.4    557c83e11646  8dcb1b0d5acb  
mgr.centos8stream.vdkcio     centos8stream  *:9283       running (7m)     4m ago   7m     435M        -  16.2.7   c92aec2cd894  d90383111c8f  
mon.centos8stream            centos8stream               running (7m)     4m ago   7m    42.7M    2048M  16.2.7   c92aec2cd894  25e5eaf974b8  
node-exporter.centos8stream  centos8stream  *:9100       running (6m)     4m ago   6m    13.4M        -  0.18.1   e5a616e4b9cf  8469332d2f9f  
prometheus.centos8stream     centos8stream  *:9095       running (5m)     4m ago   5m    27.5M        -  2.18.1   de242295e225  8f496a48cb3d

I'm closing this issue since the fix consists on updating podman version.

Actions

Copy link