Project

General

Profile

Bug #21461

SELinux file_context update causes OSDs to restart when upgrading to Luminous from Kraken or Jewel.

Added by Dan Williams almost 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
rpm
Target version:
Start date:
09/19/2017
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

I've got a small cluster of 4 nodes each of which has 10 disks.
Nodes 1-3 run ceph-mon.

When following the release notes upgrade for luminous http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken

It states the monitors have to be upgraded first. When upgrading the ceph-mon package on the first node due to expected dependencies the rest of the ceph-* packages also get upgraded.

The selinux policy has a new entry for ceph-mgr (https://github.com/ceph/ceph/commit/8f6a526f9a36ff847755cba68b6b78b37e8e99cb) and triggers (the %post section - lines 1671-1716 https://github.com/ceph/ceph/blob/master/ceph.spec.in) an immediate restart of the cep daemons (specifically restarting the OSDs). However because not all the mons have been upgraded the luminous feature string isn't in the monmap which means the OSDs won't connect.
This leaves me with 10/40 OSDs down. These will stay in the down state until all of the Mons have been upgraded (which because of the above causes 30/40 OSDs to be marked down

With a small cluster this causes serious issues.

Previously this was acceptable behaviour because the OSDs wouldn't require the specific luminous feature string to be in the monmap before coming UP.

If you're cluster is big enough that you can have all of the OSDs from all of the MON nodes down without issue then it's less of a problem, but still painful.

At the very least I think this needs documenting in the releases notes.

Perhaps the selinux file_context change could be backported to kraken then the selinux policy would be updated (as OSDs restarted when not blocking on the luminous feature) before then doing the upgrade to luminous.

I've got a horrible short term work around -

mv /usr/sbin/selinuxenabled /usr/sbin/selinuxenabled.old
echo "exit 1" > /usr/sbin/selinuxenabled
do upgrade
mv /usr/sbin/selinuxenabled.old /usr/sbin/selinuxenabled

This tricks the %post script into note restarting the OSDs. This should be ok as the only change when going from Kraken to Luminous is to the mgr context, and the next-step in the upgrade is to restart it anyway.

History

#1 Updated by Dan Williams almost 2 years ago

I'm happy to provide further further details incase my random waffle above doesn't make sense.

Also available in: Atom PDF