Bug #53397
closedmake cephadm pass CEPH_VOLUME_SKIP_RESTORECON when running ceph-volume
0%
Description
In containerized deployments, ceph-volume shouldn't try to make any call to restorecon binary.
Given ceph-volume can't really detect whether it's running from a container and it offers a way to skip this call by setting CEPH_VOLUME_SKIP_RESTORECON=1
let's make cephadm pass this variable when running a ceph-volume container
Updated by Guillaume Abrioux over 2 years ago
- Status changed from New to Fix Under Review
Updated by Sebastian Wagner over 2 years ago
copied from downstream:
It turns out ceph-volume fails here (although this behavior hasn't been reported upstream) to create OSDs because during the creation workflow a call to `restorecon` fails [1].
As a consequence, ceph-volume leaves an OSD not fully prepared, the cluster won't even see it which means the osd ID that was going to be used will be picked by another OSD later.
That being said, when running from a container, ceph-volume shouldn't see SELinux is enabled [2], so it shouldn't make this call to `restorecon` [3].
The fix I would suggest here is to make cephadm pass en environment variable `CEPH_VOLUME_SKIP_RESTORECON` [4] to skip this operation since it doesn't make sense to run it in a containerized context, see PR [5]
[1] restorecon failure: [2021-11-24 09:37:54,441][ceph_volume.process][INFO ] Running command: /usr/sbin/selinuxenabled [2021-11-24 09:37:54,444][ceph_volume.process][INFO ] Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-1 [2021-11-24 09:37:54,462][ceph_volume.process][INFO ] stderr No such file or directory [2021-11-24 09:37:54,462][ceph_volume.devices.lvm.prepare][ERROR ] lvm prepare was unable to complete [2] expected behavior: [2021-11-24 09:39:48,605][ceph_volume.process][INFO ] Running command: /usr/sbin/selinuxenabled [2021-11-24 09:39:48,608][ceph_volume.util.system][INFO ] SELinux is not enabled, will not call restorecon [2021-11-24 09:39:48,609][ceph_volume.process][INFO ] Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-fsid/osd-block-osdid
Updated by Ken Dreyer over 2 years ago
It's good to skip restorecon for performance.
However, there's a bigger problem if SELinux is broken in the container. Even if we never execute restorecon, DNF (and probably other things) will not work either.
In #49239, we made /sys/fs/selinux bind-mount to an empty directory. Is that feature still working?
Updated by Ken Dreyer over 2 years ago
(Tangent: the restorecon ENOENT error message is a confusing message, and I opened https://bugzilla.redhat.com/show_bug.cgi?id=1926511 for improving that in SELinux.)
Updated by Ken Dreyer over 2 years ago
I cannot reproduce this in a minimal container.
podman run -it -v /usr/share/empty:/sys/fs/selinux:ro ubi8 [root@58fe65c8e17e /]# yum -y install policycoreutils [root@58fe65c8e17e /]# sestatus SELinux status: disabled [root@58fe65c8e17e /]# selinuxenabled [root@58fe65c8e17e /]# echo $? 1
selinuxenabled
should always return "1" inside the container when we're mounting /sys/fs/selinux properly.
Updated by Guillaume Abrioux over 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Guillaume Abrioux about 2 years ago
- Status changed from Pending Backport to Resolved
- Backport changed from pacific,octopus to pacific