Project

General

Profile

Actions

Bug #51361

open

KillMode=none is deprecated

Added by Sebastian Wagner almost 3 years ago. Updated about 2 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We chaged systemd unit file killmode to none in https://github.com/ceph/ceph/pull/33162#issuecomment-584183316

Now we're getting a new warning:

Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.

Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #58242: cephadm doesn't communicate with containers that failed initial start but were successfully restarted later by systemdResolvedRedouane Kachach Elhichou

Actions
Actions #1

Updated by Sebastian Wagner almost 3 years ago

Answer by Valentin:

Hi Sebastian, feel free to ignore this warning. Systemd still supports KillMode=none but they decided to very slowly deprecate it. The reasons for that was that some third-party vendors somehow abused that setting which in turn caused problems during shutdown; some units just didn't want to die.
Starting with Podman v3.2 the units are using type=notify which changed many things but we found an alternative solution. Let me look it up quickly.

An alternative to KillMode=none is to remove that but add a high time out TimeoutStopSec=70. The reasoning behind is that we want to avoid between Podman and systemd trying to kill the container. We want Podman to do that, such that services have the chance to stop gracefully and that Podman can perform the necessary clean-up tasks.
type=notify is more systemd-idiomatic and what the systemd folks asked us to do in the future

Hope that helps.

Bottom line is: the warning is nothing to worry about for now. The systemd folks wanted to make it scary for the aforementioned reasons.

Actions #2

Updated by Redouane Kachach Elhichou almost 2 years ago

  • Priority changed from Normal to Low
Actions #3

Updated by Zack Cerza over 1 year ago

  • Status changed from New to Fix Under Review
  • Assignee set to Zack Cerza
  • Pull request ID set to 48317

I filed a fix for this because it was preventing me from setting up clusters in nested rootless podman containers on CentOS 8.Stream. It definitely needs to be run through tests to make sure it doesn't regress anything. I omitted TimeoutStopSec because the default is 120s.

Actions #4

Updated by Zack Cerza over 1 year ago

  • Status changed from Fix Under Review to New
  • Assignee deleted (Zack Cerza)
  • Pull request ID deleted (48317)

I was wrong about KillMode=none for my use case

Actions #5

Updated by Redouane Kachach Elhichou over 1 year ago

  • Related to Bug #58242: cephadm doesn't communicate with containers that failed initial start but were successfully restarted later by systemd added
Actions #6

Updated by Redouane Kachach Elhichou over 1 year ago

  • Assignee set to Redouane Kachach Elhichou
Actions #7

Updated by Redouane Kachach Elhichou about 1 year ago

  • Assignee deleted (Redouane Kachach Elhichou)

This doesn't seem to be causing any issues so far. In addition at cephadm we are already using a very big TimeoutStopSec=120s, higher than the amount recommended by podman devs. I'll keep this open by now just in case at some point systemd folks decide to remove KillMode=none option support in that case we have to update our code.

Actions #8

Updated by David Heap about 1 year ago

Quick question on this, we're seeing services with KillMode=none being killed for a restart after 10s (RestartSec=10s) during ceph orch upgrades.

It's taking iscsi containers more than 10 seconds to shutdown, so the new container being brought up is unable to access the resources it needs:

16:45:11 Upgrade: Updating iscsi.iscsi.<hostname>.nrtwwn (1/2)
16:45:11 Deploying daemon iscsi.iscsi.<hostname>.nrtwwn on <hostname>
16:45:11 Stopping Ceph iscsi.iscsi.<hostname>.nrtwwn for 4ade5f05-4ad7-4440-a0f6-00d833b6730c...
16:45:12 debug Shutdown received
16:45:23 Stopped Ceph iscsi.iscsi.<hostname>.nrtwwn for 4ade5f05-4ad7-4440-a0f6-00d833b6730c.
16:45:23 ceph-4ade5f05-4ad7-4440-a0f6-00d833b6730c@iscsi.iscsi.<hostname>.nrtwwn.service: Consumed 8h 43min 45.587s CPU time.
16:45:23,689 7f0107b81740 WARNING Failed to trim old cgroups /sys/fs/cgroup/system.slice/system-ceph\x2d4ade5f05\x2d4ad7\x2d4440\x2da0f6\x2d00d833b6730c.slice/ceph-4ade5f05-4ad7-4440-a0f6-00d833b6730c@iscsi.iscsi.<hostname>.nrtwwn.service
16:45:23 Starting Ceph iscsi.iscsi.<hostname>.nrtwwn for 4ade5f05-4ad7-4440-a0f6-00d833b6730c...
16:45:23 ceph-4ade5f05-4ad7-4440-a0f6-00d833b6730c@iscsi.iscsi.<hostname>.nrtwwn.service: Failed to attach to cgroup /system.slice/system-ceph\x2d4ade5f05\x2d4ad7\x2d4440\x2da0f6\x2d00d833b6730c.slice/ceph-4ade5f05-4ad7-4440-a0f6-00d833b6730c@iscsi.iscsi.<hostname>.nrtwwn.service: Device or resource busy
16:45:23 ceph-4ade5f05-4ad7-4440-a0f6-00d833b6730c@iscsi.iscsi.<hostname>.nrtwwn.service: Failed at step CGROUP spawning /bin/bash: Device or resource busy
16:45:23 ceph-4ade5f05-4ad7-4440-a0f6-00d833b6730c@iscsi.iscsi.<hostname>.nrtwwn.service: Control process exited, code=exited, status=219/CGROUP

...causing the upgrade to pause with error:

UPGRADE_REDEPLOY_DAEMON: Upgrading daemon iscsi.iscsi.<hostname>.lofqpp on host <hostname> failed`. 

Systemd keeps trying to start the container and eventually brings it up once the processes in the cgroup release the resources, and resuming the upgrade allows it to continue.

If KillMode is staying as none, should the RestartSec timeout be longer too, as looks like systemd ignores TimeoutStopSec during a service restart?

Actions #9

Updated by Christian Rohmann 3 months ago

I just saw the warning about KillMode=none for unit `ceph-volume@.service` on Reef (18.2.1) on Ubuntu 22.04 LTS (running sytemd 249). Apparently the deprecation warning was introduced with 246 already, see https://github.com/systemd/systemd/blob/8a9bf03bd7e301037e32277d58adbd2eb7ce5534/NEWS#L5887.

1. FWIW, this is a non-cephadm installation using packages (https://docs.ceph.com/en/latest/install/get-packages/#debian-packages), not containers via podman.
2. Podman did change it's systemd generator / recommendations away from `KillMode=none` a long time ago: https://github.com/containers/podman/pull/8889
3. And they then switched to sdnotify to fully integrate with systemd, see https://github.com/containers/podman/pull/10557

Actions #11

Updated by Sebastian Wagner 3 months ago

we moved over to killMode=none was to avoid https://tracker.ceph.com/issues/43883 . Might be that this is no longer a problem?

Actions #12

Updated by Christian Rohmann 3 months ago

Sebastian Wagner wrote:

we moved over to killMode=none was to avoid https://tracker.ceph.com/issues/43883 . Might be that this is no longer a problem?

That's totally my assumption looking at the changes done to podman itself and the recommended unit Type=Notify.
The side effect was due to systemd trying its best to track the process(es) launched by the unit (which was podman, which then spawned and maintained yet another process).
But this is all obsolete now and deserves an overhaul.

Actions #13

Updated by Christian Rohmann 3 months ago

I also opened issue https://tracker.ceph.com/issues/64146 for ceph-volume still using KillMode=none.
That is totally unrelated to cephadm so I thought it might make sense to track and deal with this independently.

Actions #14

Updated by John Mulligan about 2 months ago

FWIW, the cephadm team is aware of the issue, it's just that it has been a lower priority as it's "just" a warning - an annoying one for sure, though.

A few months back I did a fairly large refactor of cephadm including adding support for a templating library. Until that was done, the "templating" of systemd unit files was rather restricted. I think that with the updated library code it would be easier to switch to using better, more recent, unit file idioms when using podman, but we also have to be careful not to break docker support.

While I don't have the bandwidth to do it myself at the moment I'd be happy to coach anyone who wants to volunteer for the task.

Actions

Also available in: Atom PDF