Project

General

Profile

Actions

Bug #53397

closed

make cephadm pass CEPH_VOLUME_SKIP_RESTORECON when running ceph-volume

Added by Guillaume Abrioux over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In containerized deployments, ceph-volume shouldn't try to make any call to restorecon binary.

Given ceph-volume can't really detect whether it's running from a container and it offers a way to skip this call by setting CEPH_VOLUME_SKIP_RESTORECON=1
let's make cephadm pass this variable when running a ceph-volume container

Actions #1

Updated by Guillaume Abrioux over 2 years ago

  • Pull request ID set to 44104
Actions #2

Updated by Guillaume Abrioux over 2 years ago

  • Status changed from New to Fix Under Review
Actions #3

Updated by Sebastian Wagner over 2 years ago

copied from downstream:

It turns out ceph-volume fails here (although this behavior hasn't been reported upstream) to create OSDs because during the creation workflow a call to `restorecon` fails [1].
As a consequence, ceph-volume leaves an OSD not fully prepared, the cluster won't even see it which means the osd ID that was going to be used will be picked by another OSD later.

That being said, when running from a container, ceph-volume shouldn't see SELinux is enabled [2], so it shouldn't make this call to `restorecon` [3].
The fix I would suggest here is to make cephadm pass en environment variable `CEPH_VOLUME_SKIP_RESTORECON` [4] to skip this operation since it doesn't make sense to run it in a containerized context, see PR [5]


[1] restorecon failure:
[2021-11-24 09:37:54,441][ceph_volume.process][INFO  ] Running command: /usr/sbin/selinuxenabled
[2021-11-24 09:37:54,444][ceph_volume.process][INFO  ] Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-1
[2021-11-24 09:37:54,462][ceph_volume.process][INFO  ] stderr No such file or directory
[2021-11-24 09:37:54,462][ceph_volume.devices.lvm.prepare][ERROR ] lvm prepare was unable to complete

[2] expected behavior:
[2021-11-24 09:39:48,605][ceph_volume.process][INFO  ] Running command: /usr/sbin/selinuxenabled
[2021-11-24 09:39:48,608][ceph_volume.util.system][INFO  ] SELinux is not enabled, will not call restorecon
[2021-11-24 09:39:48,609][ceph_volume.process][INFO  ] Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-fsid/osd-block-osdid
Actions #4

Updated by Ken Dreyer over 2 years ago

It's good to skip restorecon for performance.

However, there's a bigger problem if SELinux is broken in the container. Even if we never execute restorecon, DNF (and probably other things) will not work either.

In #49239, we made /sys/fs/selinux bind-mount to an empty directory. Is that feature still working?

Actions #5

Updated by Ken Dreyer over 2 years ago

(Tangent: the restorecon ENOENT error message is a confusing message, and I opened https://bugzilla.redhat.com/show_bug.cgi?id=1926511 for improving that in SELinux.)

Actions #6

Updated by Ken Dreyer over 2 years ago

I cannot reproduce this in a minimal container.

podman run -it -v /usr/share/empty:/sys/fs/selinux:ro ubi8
[root@58fe65c8e17e /]# yum -y install policycoreutils
[root@58fe65c8e17e /]# sestatus 
SELinux status:                 disabled
[root@58fe65c8e17e /]# selinuxenabled 
[root@58fe65c8e17e /]# echo $?
1

selinuxenabled should always return "1" inside the container when we're mounting /sys/fs/selinux properly.

Actions #7

Updated by Guillaume Abrioux over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #8

Updated by Guillaume Abrioux about 2 years ago

  • Status changed from Pending Backport to Resolved
  • Backport changed from pacific,octopus to pacific
Actions

Also available in: Atom PDF