Project

General

Profile

Actions

Bug #48405

open

systemd-units ordering cycle since 14.2.12 (ordering units before remote-fs-pre.target

Added by Stoiko Ivanov over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

With the changes from commits:
  • d88c834ea44bd67cfde0bd11ec4ded079b76d11a (master)
  • ed34cf29cfe8b42c17cf334455c1491700f60504 (nautilus, contained in v14.2.12 - v14.2.15)
  • cdc0fbca0f562def82a963aab509e02b71535321 (octopus-saved)
A systemd ordering cycle was introduced in certain environments (here on Proxmox VE):
  • by introducing an ordering Before=remote-fs-pre.target the services (ceph-mgr@.service(.in), ceph-mon@.service(.in), ceph-mds@.service(.in)), the services create a loop if they depend (After) on a service which itself needs to start After remote-fs-pre.target

One situation would be if the ceph config file is on a networked filesystem.

In this case the loop was created by ceph.conf being in the cluster-filesystem, which itself starts 'After' rrdcached.
rrdcached in debian is still started by a generated unit (interpreting the shipped init-script), which depends on remote-fs.target (the init script depends on $remote_fs, which translates to a ordering after remote-fs.target) - see systemd.special(7)

This is quite well explained in a post to the pve-user mailing list - see [0], where the issue was first brought up, since it lead to broken ceph-clusters, without a deterministic pattern (cycle breaking in systemd is not deterministic)

journal from a boot exposing the issue:

Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found ordering cycle on ceph-mon@buster-ceph-02.service/stop
Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on pve-cluster.service/stop
Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on rrdcached.service/stop
Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on remote-fs.target/stop
Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on remote-fs-pre.target/stop
Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on rbdmap.service/stop
Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on ceph.target/stop
Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on ceph-mgr.target/stop
Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on ceph-mon.target/stop
Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Job ceph-mon@buster-ceph-02.service/stop deleted to break ordering cycle starting with ceph-mon.target/stop

Given the potential for the issue occurring in multiple environments (any system, which still starts a service via sysv-generator, where the init-script depends on $remote_fs) I would propose to drop the ordering before remote-fs-pre.target (and could provide an appropriate PR)

[0] https://lists.proxmox.com/pipermail/pve-user/2020-November/172124.html

No data to display

Actions

Also available in: Atom PDF