Bug #13061
closed
systemd: daemons restart when package is upgraded
Added by Dan van der Ster over 8 years ago.
Updated over 8 years ago.
Description
I just updated some ceph-mon and ceph-osd hosts from ceph-9.0.3-1460.g4290d68.el7.x86_64 to ceph-9.0.3-1572.g90cce11.el7.x86_64 and all the daemons restarted at the time of the yum updates.
From yum.log:
Sep 11 16:27:43 Updated: 1:ceph-9.0.3-1572.g90cce11.el7.x86_64
From journalctl -u ceph:
Sep 11 16:27:43 lxfsrd37a01.cern.ch systemd[1]: Stopping Ceph object storage daemon...
Sep 11 16:27:43 lxfsrd37a01.cern.ch ceph-osd[131530]: 2015-09-11 16:27:43.492935 7f01699c8700 -1 osd.0 248 *** Got signal Terminated ***
Sep 11 16:27:43 lxfsrd37a01.cern.ch ceph-osd[131530]: 2015-09-11 16:27:43.843512 7f01699c8700 -1 osd.0 248 shutdown
Sep 11 16:27:46 lxfsrd37a01.cern.ch systemd[1]: Stopped Ceph object storage daemon.
Sep 11 16:28:04 lxfsrd37a01.cern.ch systemd[1]: Starting Ceph object storage daemon...
Sep 11 16:28:04 lxfsrd37a01.cern.ch ceph-osd-prestart.sh[250880]: getopt: unrecognized option '--setuser'
Sep 11 16:28:04 lxfsrd37a01.cern.ch ceph-osd-prestart.sh[250880]: getopt: unrecognized option '--setgroup'
Sep 11 16:28:05 lxfsrd37a01.cern.ch ceph-osd-prestart.sh[250880]: create-or-move updated item name 'osd.0' weight 1.6816 at location {host=lxfsrd37a01,rack=R
Sep 11 16:28:05 lxfsrd37a01.cern.ch systemd[1]: Started Ceph object storage daemon.
Sep 11 16:28:05 lxfsrd37a01.cern.ch ceph-osd[251987]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
The differs from the current (hammer) behaviour, which IIRC was agreed upon so that package auto-upgrades don't trigger daemon restarts.
- Status changed from New to In Progress
- Assignee set to Loïc Dachary
- Priority changed from Normal to Urgent
this may explain part of the problem with #13000, I'll try to repeat that
It turns out to be unrelated to #13000 :-) Running an upgrade suite to repeat the problem. Maybe it only happens with v9.0.x -> v9.0.x+1 upgrade though.
teuthology-suite --priority 101 --machine-type vps --suite upgrade/hammer-x/parallel --suite-branch master --ceph infernalis --filter=centos_7 --email loic@dachary.org
The behavior was changed after v9.0.3 by https://github.com/ceph/ceph/pull/5674 when selinux is involved, the daemons must be restarted. Why is it needed exactly I'm not sure but it seems to be a selinux requirement and I'm assuming the running daemons would stop functioning otherwise.
%post selinux
%if 0%{?_with_systemd}
/usr/bin/systemctl status ceph.target > /dev/null 2>&1
%else
/sbin/service ceph status >/dev/null 2>&1
%endif
STATUS=$?
if test $STATUS -eq 0; then
%if 0%{?_with_systemd}
/usr/bin/systemctl stop ceph.target > /dev/null 2>&1
%else
/sbin/service ceph stop >/dev/null 2>&1
%endif
fi
OLD_POLVER=$(%{_sbindir}/semodule -l | grep -P '^ceph[\t ]' | awk '{print $2}')
%{_sbindir}/semodule -n -i %{_datadir}/selinux/packages/ceph.pp
NEW_POLVER=$(%{_sbindir}/semodule -l | grep -P '^ceph[\t ]' | awk '{print $2}')
if %{_sbindir}/selinuxenabled; then
%{_sbindir}/load_policy
if test "$OLD_POLVER" != "$NEW_POLVER"; then
%relabel_files
fi
fi
# Start iff it was started before
if test $STATUS -eq 0; then
%if 0%{?_with_systemd}
/usr/bin/systemctl start ceph.target > /dev/null 2>&1 || :
%else
/sbin/service ceph start >/dev/null 2>&1 || :
%endif
fi
exit 0
- Status changed from In Progress to Need More Info
No idea, but you're right, this introduces a behavior change. @Boris Ranto, how can we avoid this restart?
- Assignee changed from Loïc Dachary to Boris Ranto
@Boris, could you please assign the issue back to me when you collected more information regarding the need for a restart ?
<loicd> branto: do you know why the restart of the daemon is necessary when selinux is involved ?
<loicd> branto: I should have asked you first but only now found the original commit introducing this change ( https://github.com/ceph/ceph/commit/c6d6c783f46154186566d340f5f026fe78f5f961)
<sage> branto: what happens if you wait to restart (say, another day or two)? does it break?
<sage> or does it just mean that you aren't protected during that interval?
<sage> e..g, what happens if you relabel, the daemon creates some new files, and is then restarted? is that a race we have to worry about?
<branto> sage: I'm not completely sure, there might be some files that will be mislabelled afterward, I suppose I'd have to test that
<branto> it is a common practice to restart the daemons when installing the policy, though
<branto> even the fedora packaging guidelines suggest to do that
<branto> the patch that loicd linked got improved to (re)start the service iff it was running before
<loicd> branto: I noticed it got a lot better indeed :-)
<loicd> branto: it would help a great deal to have a strong rationale justifying the need to restart the daemons because it's going to be a significant and unexpected change for Ceph sysadmins.
<ktdreyer> agreed loicd
<ktdreyer> since selinux is shipping in a major release boundary, maybe it's ok to simply document "restart your daemons when upgrading to infernalis" or something?
<branto> I'll check tomorrow whether there will be any mislabelled files if we do not restart the services
<branto> but for starters, the SELinux policy will not be in effect until you restart the daemons
- Assignee changed from Boris Ranto to Loïc Dachary
OK, some more info:
(18:40:09) branto: ktdreyer, sage, loicd: I thought about it a bit and I believe it really could make some mislabelled files, e.g. you have a mislabelled directory, restorecon will list the directory to know its contents, then you create a new file, that file will be incorrectly labelled and the restorecon call won't fix that, then you get the denials when the ceph daemons will try to access the file
(18:40:40) branto: s/then you create/then a daemon creates/
(18:42:19) branto: i.e.: we need to restart the daemons in order to avoid (racy) mislabelled files
@Ken: Unfortunately, there is no way to change a context of running process. I believe this behaviour is by design -- to avoid security problems if there ever was such a method.
- Status changed from Need More Info to In Progress
- Assignee changed from Loïc Dachary to Boris Ranto
@Boris: back to you. Although I don't have the knowledge to do the proposed change, I'll be available if you need a rubber duck :-)
<branto> loicd, sage: we could probably move the daemon stop/start inside the condition that checks whether the policy changed to minimize the impact, i.e.: restart the daemons only if the policy version changed
<sage> branto: yeah, let's do that
- Status changed from In Progress to Fix Under Review
- Assignee changed from Boris Ranto to Loïc Dachary
- Assignee changed from Loïc Dachary to Boris Ranto
- Status changed from Fix Under Review to Resolved
- Related to Bug #21672: Upgrading the ceph-selinux package unconditionally restarts all running daemons if selinux is enabled added
Also available in: Atom
PDF