Bug #13061: systemd: daemons restart when package is upgraded - Ceph - Ceph

Actions

Copy link

Bug #13061

closed

systemd: daemons restart when package is upgraded

Added by Dan van der Ster over 8 years ago. Updated over 8 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Boris Ranto

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Yes

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I just updated some ceph-mon and ceph-osd hosts from ceph-9.0.3-1460.g4290d68.el7.x86_64 to ceph-9.0.3-1572.g90cce11.el7.x86_64 and all the daemons restarted at the time of the yum updates.

From yum.log:

Sep 11 16:27:43 Updated: 1:ceph-9.0.3-1572.g90cce11.el7.x86_64

From journalctl -u ceph:

Sep 11 16:27:43 lxfsrd37a01.cern.ch systemd[1]: Stopping Ceph object storage daemon...
Sep 11 16:27:43 lxfsrd37a01.cern.ch ceph-osd[131530]: 2015-09-11 16:27:43.492935 7f01699c8700 -1 osd.0 248 *** Got signal Terminated ***
Sep 11 16:27:43 lxfsrd37a01.cern.ch ceph-osd[131530]: 2015-09-11 16:27:43.843512 7f01699c8700 -1 osd.0 248 shutdown
Sep 11 16:27:46 lxfsrd37a01.cern.ch systemd[1]: Stopped Ceph object storage daemon.
Sep 11 16:28:04 lxfsrd37a01.cern.ch systemd[1]: Starting Ceph object storage daemon...
Sep 11 16:28:04 lxfsrd37a01.cern.ch ceph-osd-prestart.sh[250880]: getopt: unrecognized option '--setuser'
Sep 11 16:28:04 lxfsrd37a01.cern.ch ceph-osd-prestart.sh[250880]: getopt: unrecognized option '--setgroup'
Sep 11 16:28:05 lxfsrd37a01.cern.ch ceph-osd-prestart.sh[250880]: create-or-move updated item name 'osd.0' weight 1.6816 at location {host=lxfsrd37a01,rack=R
Sep 11 16:28:05 lxfsrd37a01.cern.ch systemd[1]: Started Ceph object storage daemon.
Sep 11 16:28:05 lxfsrd37a01.cern.ch ceph-osd[251987]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal

The differs from the current (hammer) behaviour, which IIRC was agreed upon so that package auto-upgrades don't trigger daemon restarts.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Status changed from New to In Progress
Assignee set to Loïc Dachary

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Priority changed from Normal to Urgent

this may explain part of the problem with #13000, I'll try to repeat that

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

It turns out to be unrelated to #13000 :-) Running an upgrade suite to repeat the problem. Maybe it only happens with v9.0.x -> v9.0.x+1 upgrade though.

teuthology-suite --priority 101 --machine-type vps --suite upgrade/hammer-x/parallel --suite-branch master --ceph infernalis --filter=centos_7 --email loic@dachary.org

fail http://pulpito.ceph.com/loic-2015-10-01_17:07:01-upgrade:hammer-x:parallel-infernalis---basic-vps

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

The behavior was changed after v9.0.3 by https://github.com/ceph/ceph/pull/5674 when selinux is involved, the daemons must be restarted. Why is it needed exactly I'm not sure but it seems to be a selinux requirement and I'm assuming the running daemons would stop functioning otherwise.

%post selinux
%if 0%{?_with_systemd}
    /usr/bin/systemctl status ceph.target > /dev/null 2>&1
%else
    /sbin/service ceph status >/dev/null 2>&1
%endif
STATUS=$?

if test $STATUS -eq 0; then
%if 0%{?_with_systemd}
    /usr/bin/systemctl stop ceph.target > /dev/null 2>&1
%else
    /sbin/service ceph stop >/dev/null 2>&1
%endif
fi

OLD_POLVER=$(%{_sbindir}/semodule -l | grep -P '^ceph[\t ]' | awk '{print $2}')
%{_sbindir}/semodule -n -i %{_datadir}/selinux/packages/ceph.pp
NEW_POLVER=$(%{_sbindir}/semodule -l | grep -P '^ceph[\t ]' | awk '{print $2}')
if %{_sbindir}/selinuxenabled; then
    %{_sbindir}/load_policy
    if test "$OLD_POLVER" != "$NEW_POLVER"; then
        %relabel_files
   fi
fi

# Start iff it was started before
if test $STATUS -eq 0; then
%if 0%{?_with_systemd}
    /usr/bin/systemctl start ceph.target > /dev/null 2>&1 || :
%else
    /sbin/service ceph start >/dev/null 2>&1 || :
%endif
fi

exit 0

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Status changed from In Progress to Need More Info

@ken, do you know why the restart of the daemon is necessary when selinux is involved ? The initial addition of this behavior (which is perceived as a regression) was in https://github.com/ceph/ceph/pull/5421 ( https://github.com/ceph/ceph/commit/c6d6c783f46154186566d340f5f026fe78f5f961 specifically).

Actions

Copy link

Updated by Ken Dreyer over 8 years ago

No idea, but you're right, this introduces a behavior change. @Boris Ranto, how can we avoid this restart?

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Assignee changed from Loïc Dachary to Boris Ranto

@Boris, could you please assign the issue back to me when you collected more information regarding the need for a restart ?

<loicd> branto: do you know why the restart of the daemon is necessary when selinux is involved ? 
<loicd> branto: I should have asked you first but only now found the original commit introducing this change ( https://github.com/ceph/ceph/commit/c6d6c783f46154186566d340f5f026fe78f5f961)
<sage> branto: what happens if you wait to restart (say, another day or two)?  does it break?
<sage> or does it just mean that you aren't protected during that interval?
<sage> e..g, what happens if you relabel, the daemon creates some new files, and is then restarted?  is that a race we have to worry about?
<branto> sage: I'm not completely sure, there might be some files that will be mislabelled afterward, I suppose I'd have to test that
<branto> it is a common practice to restart the daemons when installing the policy, though
<branto> even the fedora packaging guidelines suggest to do that
<branto> the patch that loicd linked got improved to (re)start the service iff it was running before
<loicd> branto: I noticed it got a lot better indeed :-)
<loicd> branto: it would help a great deal to have a strong rationale justifying the need to restart the daemons because it's going to be a significant and unexpected change for Ceph sysadmins. 
<ktdreyer> agreed loicd 
<ktdreyer> since selinux is shipping in a major release boundary, maybe it's ok to simply document "restart your daemons when upgrading to infernalis" or something?
<branto> I'll check tomorrow whether there will be any mislabelled files if we do not restart the services
<branto> but for starters, the SELinux policy will not be in effect until you restart the daemons

Actions

Copy link

Updated by Boris Ranto over 8 years ago

Assignee changed from Boris Ranto to Loïc Dachary

OK, some more info:

(18:40:09) branto: ktdreyer, sage, loicd: I thought about it a bit and I believe it really could make some mislabelled files, e.g. you have a mislabelled directory, restorecon will list the directory to know its contents, then you create a new file, that file will be incorrectly labelled and the restorecon call won't fix that, then you get the denials when the ceph daemons will try to access the file
(18:40:40) branto: s/then you create/then a daemon creates/
(18:42:19) branto: i.e.: we need to restart the daemons in order to avoid (racy) mislabelled files

@Ken: Unfortunately, there is no way to change a context of running process. I believe this behaviour is by design -- to avoid security problems if there ever was such a method.

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Status changed from Need More Info to In Progress
Assignee changed from Loïc Dachary to Boris Ranto

@Boris: back to you. Although I don't have the knowledge to do the proposed change, I'll be available if you need a rubber duck :-)

<branto> loicd, sage: we could probably move the daemon stop/start inside the condition that checks whether the policy changed to minimize the impact, i.e.: restart the daemons only if the policy version changed
<sage> branto: yeah, let's do that

Actions

Copy link

#10