Project

General

Profile

Bug #21672

Upgrading the ceph-selinux package unconditionally restarts all running daemons if selinux is enabled

Added by Brad Hubbard over 6 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
build
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
octopus, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This behaviour is due to the postinstall script for the ceph-selinux package.

%post selinux
# backup file_contexts before update
. /etc/selinux/config
FILE_CONTEXT=/etc/selinux/${SELINUXTYPE}/contexts/files/file_contexts
cp ${FILE_CONTEXT} ${FILE_CONTEXT}.pre

# Install the policy
/usr/sbin/semodule -i %{_datadir}/selinux/packages/ceph.pp

# Load the policy if SELinux is enabled
if ! /usr/sbin/selinuxenabled; then
    # Do not relabel if selinux is not enabled
    exit 0
fi

if diff ${FILE_CONTEXT} ${FILE_CONTEXT}.pre > /dev/null 2>&1; then
   # Do not relabel if file contexts did not change
   exit 0
fi

# Check whether the daemons are running
/usr/bin/systemctl status ceph.target > /dev/null 2>&1
STATUS=$?

# Stop the daemons if they were running
if test $STATUS -eq 0; then
    /usr/bin/systemctl stop ceph.target > /dev/null 2>&1
fi

# Now, relabel the files
/usr/sbin/fixfiles -C ${FILE_CONTEXT}.pre restore 2> /dev/null
rm -f ${FILE_CONTEXT}.pre
# The fixfiles command won't fix label for /var/run/ceph
/usr/sbin/restorecon -R /var/run/ceph > /dev/null 2>&1

# Start the daemons iff they were running before
if test $STATUS -eq 0; then
    /usr/bin/systemctl start ceph.target > /dev/null 2>&1 || :
fi
exit 0

So the prerequisites for this to occur are that selinux is enabled and the file context is changed by the upgraded ceph-selinux package.

Note that the postuninstall script also has the potential to restart the daemons.

See https://www.spinics.net/lists/ceph-users/msg38852.html for more information.


Related issues

Related to Ceph - Bug #13061: systemd: daemons restart when package is upgraded Resolved 09/11/2015
Copied to Ceph - Backport #51838: octopus: Upgrading the ceph-selinux package unconditionally restarts all running daemons if selinux is enabled Resolved
Copied to Ceph - Backport #51839: pacific: Upgrading the ceph-selinux package unconditionally restarts all running daemons if selinux is enabled Resolved

History

#1 Updated by Brad Hubbard over 6 years ago

  • Description updated (diff)

#2 Updated by Dan Mick over 6 years ago

...and this is a problem because daemon restart ordering may matter, especially at daemon-upgrade time, which is the time when ceph-selinux is likely to be reinstalled

#3 Updated by Dan van der Ster over 6 years ago

Have there been any more discussions about this? AFAICT it still leads to daemons restarting during upgrade.
And I can't find a way to fully disable selinux at runtime (permissive is not sufficient here).

At least ceph-selinux should be taught to respect the CEPH_AUTO_RESTART_ON_UPGRADE config.

#4 Updated by Boris Ranto over 5 years ago

You have a couple of options, here:

- you can stop the daemons before the (major) upgrade that requires the daemons to be started in a specific way; the script won't start the daemons if they were stopped before
- you can disable SELinux by editing /etc/selinux/config, changing SELINUX to disabled and rebooting the machine

The CEPH_AUTO_RESTART_ON_UPGRADE option is different in that it covers automated restarts on upgrades. The ceph-selinux post script restarts (well, stops and starts) the daemons only if it needs to do so -- it needs to relabel the files and the running daemons would conflict with this operation.

#5 Updated by Dan van der Ster over 5 years ago

Just had this hit us again on a 12.2.8->12.2.10 upgrade on the mon/mgr nodes.
Since ceph-selinux is updated before ceph-mon, ceph-mgr, ..., the daemons are restarted when the packages are only partially updated.
Here is the upgrade ordering:

Nov 28 10:45:37 Updated: 2:librados2-12.2.10-0.el7.x86_64
Nov 28 10:45:37 Updated: 2:librbd1-12.2.10-0.el7.x86_64
Nov 28 10:45:37 Updated: 2:python-rados-12.2.10-0.el7.x86_64
Nov 28 10:45:37 Updated: 2:libcephfs2-12.2.10-0.el7.x86_64
Nov 28 10:45:37 Updated: 2:librgw2-12.2.10-0.el7.x86_64
Nov 28 10:45:37 Updated: 2:python-rgw-12.2.10-0.el7.x86_64
Nov 28 10:45:37 Updated: 2:python-cephfs-12.2.10-0.el7.x86_64
Nov 28 10:45:38 Updated: 2:python-rbd-12.2.10-0.el7.x86_64
Nov 28 10:45:38 Updated: 2:libradosstriper1-12.2.10-0.el7.x86_64
Nov 28 10:45:40 Updated: 2:ceph-common-12.2.10-0.el7.x86_64
Nov 28 10:45:41 Updated: 2:ceph-base-12.2.10-0.el7.x86_64
Nov 28 10:47:29 Updated: 2:ceph-selinux-12.2.10-0.el7.x86_64
Nov 28 10:47:29 Updated: 2:ceph-mgr-12.2.10-0.el7.x86_64
Nov 28 10:47:31 Updated: 2:ceph-osd-12.2.10-0.el7.x86_64
Nov 28 10:47:32 Updated: 2:ceph-mon-12.2.10-0.el7.x86_64
Nov 28 10:47:32 Updated: 2:ceph-mds-12.2.10-0.el7.x86_64
Nov 28 10:47:32 Updated: 2:ceph-12.2.10-0.el7.x86_64

Here is the ceph restart (when ceph-selinux was upgraded, before the other pkgs):

Nov 28 10:47:28 cephkelly-mon-93f9b3b24b systemd: ceph-mgr@cephkelly-mon-93f9b3b24b.service stop-sigterm timed out. Killing.
Nov 28 10:47:29 cephkelly-mon-93f9b3b24b systemd: ceph-mgr@cephkelly-mon-93f9b3b24b.service: main process exited, code=killed, status=9/KILL
Nov 28 10:47:29 cephkelly-mon-93f9b3b24b systemd: Unit ceph-mgr@cephkelly-mon-93f9b3b24b.service entered failed state.
Nov 28 10:47:29 cephkelly-mon-93f9b3b24b systemd: ceph-mgr@cephkelly-mon-93f9b3b24b.service failed.
Nov 28 10:47:29 cephkelly-mon-93f9b3b24b systemd: Started Ceph cluster manager daemon.
Nov 28 10:47:29 cephkelly-mon-93f9b3b24b systemd: Starting Ceph cluster manager daemon...
Nov 28 10:47:29 cephkelly-mon-93f9b3b24b systemd: Reached target ceph target allowing to start/stop all ceph-mgr@.service instances at once.
Nov 28 10:47:29 cephkelly-mon-93f9b3b24b systemd: Starting ceph target allowing to start/stop all ceph-mgr@.service instances at once.
Nov 28 10:47:29 cephkelly-mon-93f9b3b24b systemd: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.
Nov 28 10:47:29 cephkelly-mon-93f9b3b24b systemd: Starting ceph target allowing to start/stop all ceph*@.service instances at once.
Nov 28 10:47:29 cephkelly-mon-93f9b3b24b yum[829348]: Updated: 2:ceph-selinux-12.2.10-0.el7.x86_64

And here is the ceph-mon failing to start because of this unfinished upgrade:

2018-11-28 10:46:40.538353 7fa3e6a12000  0 ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable), process ceph-mon, pid 832421
2018-11-28 10:46:40.538417 7fa3e6a12000  0 pidfile_write: ignore empty --pid-file
2018-11-28 10:46:40.563869 7fa3e6a12000 -1 expected plugin /usr/lib64/ceph/erasure-code/libec_jerasure.so version 12.2.8 but it claims to be 12.2.10 instead

We have a similar example of ceph-mgr not restarting at the correct moment.

Fundamentally, I believe it is wrong for ceph-selinux to be restarting daemons -- ceph-selinux should set a condition and then the daemon rpms should do the restart post upgrade.

#6 Updated by Dieter Roels about 5 years ago

I got hit by this today during upgrade from 13.2.2 to 13.2.4.

Because of this, there is no way to follow the recommended upgrade scenario of mons first, etc.. Additionally, we much prefer to restart our OSDs one by one, instead of all at once during a controlled upgrade.

For us it would already be nice to be able to opt out of this restart with a variable in /etc/sysconfig/ceph, and let us reboot with relabel at a later time (during patch update or similar).

#7 Updated by Wido den Hollander about 5 years ago

I just noticed this during a 12.2.8 to 13.2.4 upgrade. Not what I expected as I was only thinking about /etc/sysconfig/ceph

SELinux is set to Permissive mode on this host, so shouldn't the scripts check if SELinux is set to Permissive, because in that case a restart isn't required.

#8 Updated by Wido den Hollander about 5 years ago

I have a PR which might fix this partially when running SELinux in Permissive mode: https://github.com/ceph/ceph/pull/25960

#9 Updated by Brad Hubbard about 5 years ago

  • Related to Bug #13061: systemd: daemons restart when package is upgraded added

#10 Updated by Brad Hubbard about 5 years ago

See https://tracker.ceph.com/issues/13061#note-8 for a better explanation of the issue. When I created this tracker I did so with reservations. I personally do not believe daemons should ever continue to run once the on-disk binaries and associate files have changed as I've seen this cause problems in the past. The selinux potential labelling race is just another reason why the daemons should be restarted after being upgraded. Our documentation could definitely be better in this area though.

#11 Updated by Dan van der Ster about 5 years ago

Brad Hubbard wrote:

See https://tracker.ceph.com/issues/13061#note-8 for a better explanation of the issue. When I created this tracker I did so with reservations. I personally do not believe daemons should ever continue to run once the on-disk binaries and associate files have changed as I've seen this cause problems in the past. The selinux potential labelling race is just another reason why the daemons should be restarted after being upgraded. Our documentation could definitely be better in this area though.

But that relabeling should at least be done posttrans, not before the ceph-osd, etc. are updated. Currently we restart when e.g. librados2 has been updated but not ceph-osd

And btw, I wanted to remind the practical context here. This is the diff in the selinux policy:

~/g/c/selinux (luminous|✔) $ git diff v12.2.8 v13.2.4 . 
diff --git a/selinux/ceph.fc b/selinux/ceph.fc
index df47fe10b4..b942dd704d 100644
--- a/selinux/ceph.fc
+++ b/selinux/ceph.fc
@@ -4,6 +4,7 @@
 /usr/bin/ceph-mgr              --      gen_context(system_u:object_r:ceph_exec_t,s0)
 /usr/bin/ceph-mon              --      gen_context(system_u:object_r:ceph_exec_t,s0)
 /usr/bin/ceph-mds              --      gen_context(system_u:object_r:ceph_exec_t,s0)
+/usr/bin/ceph-fuse             --      gen_context(system_u:object_r:ceph_exec_t,s0)
 /usr/bin/ceph-osd              --      gen_context(system_u:object_r:ceph_exec_t,s0)
 /usr/bin/radosgw               --      gen_context(system_u:object_r:ceph_exec_t,s0)

On bluestore osds the fixfiles is quick, but on FileStore osds fixfiles can take hours to get through /var/lib/ceph/osd/...

#12 Updated by Dieter Roels about 5 years ago

I worked around this in our playbooks by creating a wrapper for /usr/sbin/selinxuenabled before the upgrade so the reboot does not occur, and afterwards rebooting the OSD nodes with autorelabel.

I understand that everything needs to restart when the labels change when running in enforcing mode, and I also understand that is still unsafe to not restart daemons when in permissive mode. However it would be nice to be able to opt out of the reboot when running permissive. By explicitly setting a variable in /etc/sysconfig/ceph users would acknowledge they understand they need to fix the labels themselves afterwards. And we can add the necessary warnings in the comments.

#13 Updated by Dan van der Ster over 2 years ago

After another unexpected daemon restart today I thought maybe we should just go ahead with a PR and debate it there:
https://github.com/ceph/ceph/pull/42286

#14 Updated by Kefu Chai over 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 42286

#15 Updated by Kefu Chai over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to octopus, pacific

#16 Updated by Backport Bot over 2 years ago

  • Copied to Backport #51838: octopus: Upgrading the ceph-selinux package unconditionally restarts all running daemons if selinux is enabled added

#17 Updated by Backport Bot over 2 years ago

  • Copied to Backport #51839: pacific: Upgrading the ceph-selinux package unconditionally restarts all running daemons if selinux is enabled added

#18 Updated by Loïc Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF