Bug #48405
opensystemd-units ordering cycle since 14.2.12 (ordering units before remote-fs-pre.target
0%
Description
- d88c834ea44bd67cfde0bd11ec4ded079b76d11a (master)
- ed34cf29cfe8b42c17cf334455c1491700f60504 (nautilus, contained in v14.2.12 - v14.2.15)
- cdc0fbca0f562def82a963aab509e02b71535321 (octopus-saved)
- by introducing an ordering
Before=remote-fs-pre.target
the services (ceph-mgr@.service(.in), ceph-mon@.service(.in), ceph-mds@.service(.in)), the services create a loop if they depend (After) on a service which itself needs to start After remote-fs-pre.target
One situation would be if the ceph config file is on a networked filesystem.
In this case the loop was created by ceph.conf being in the cluster-filesystem, which itself starts 'After' rrdcached.
rrdcached in debian is still started by a generated unit (interpreting the shipped init-script), which depends on remote-fs.target (the init script depends on $remote_fs, which translates to a ordering after remote-fs.target) - see systemd.special(7)
This is quite well explained in a post to the pve-user mailing list - see [0], where the issue was first brought up, since it lead to broken ceph-clusters, without a deterministic pattern (cycle breaking in systemd is not deterministic)
journal from a boot exposing the issue:
Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found ordering cycle on ceph-mon@buster-ceph-02.service/stop Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on pve-cluster.service/stop Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on rrdcached.service/stop Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on remote-fs.target/stop Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on remote-fs-pre.target/stop Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on rbdmap.service/stop Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on ceph.target/stop Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on ceph-mgr.target/stop Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Found dependency on ceph-mon.target/stop Nov 26 17:35:29 buster-ceph-02 systemd[1]: ceph-mon.target: Job ceph-mon@buster-ceph-02.service/stop deleted to break ordering cycle starting with ceph-mon.target/stop
Given the potential for the issue occurring in multiple environments (any system, which still starts a service via sysv-generator, where the init-script depends on $remote_fs) I would propose to drop the ordering before remote-fs-pre.target (and could provide an appropriate PR)
[0] https://lists.proxmox.com/pipermail/pve-user/2020-November/172124.html
No data to display