Fix #15419
closedceph-{mds,mon,osd,radosgw} systemd unit files need "wants=time-sync.target"
0%
Description
It sometimes happens, when starting up an entire cluster at once, that a MON or OSD starts before ntp (or systemd-timesyncd or chrony) has a chance to synchronize the clock. When it happens to a MON, the cluster comes up in HEALTH_WARN due to clock skew. Joao added some code to the MON in #14175 to make the MON cluster recover from this quicker, but the quickest fix is to restart the offending MONs.
I have been spinning up clusters in Amazon Web Services (AWS) and I've found that this racing between the ntpd.service and the ceph services is not limited just to ceph-mon. If an OSD starts before the clock is synced, the cluster starts in HEALTH_WARN and all the PGs the offending OSD participates in get stuck in "Peering" state. This disappears when the OSD is restarted.
The suggested fix is to add:
Wants=time-sync.target After=time-sync.target
to the ceph-{mds,mon,osd,radosgw} systemd unit files. This will ensure that the ntpd/chrony/systemd-timesyncd service is started before the respective Ceph daemon starts.
Updated by Nathan Cutler about 8 years ago
master PR: https://github.com/ceph/ceph/pull/8489
Updated by Nathan Cutler about 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Sage Weil about 8 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to jewel
Updated by Nathan Cutler about 8 years ago
- Copied to Backport #15606: jewel: ceph-{mds,mon,osd,radosgw} systemd unit files need "wants=time-sync.target" added
Updated by Fabian Grünbichler almost 8 years ago
The "Wants=time-sync.target" is wrong here according to 'man systemd.special':
"A number of special system targets are defined that can be used to properly order boot-up of optional services. These targets are generally not part of the
initial boot transaction, unless they are explicitly pulled in by one of the implementing services. Note specifically that these passive target units are
generally not pulled in by the consumer of a service, but by the provider of the service. This means: a consuming service should order itself after these
targets (as appropriate), but not pull it in. A providing service should order itself before these targets (as appropriate) and pull it in (via a Wants= type
dependency)."
and
"time-sync.target
Services responsible for synchronizing the system clock from a remote source (such as NTP client implementations) should pull in this target and order
themselves before it. All services where correct time is essential should be ordered after this unit, but not pull it in. systemd automatically adds
dependencies of type After= for this target unit to all SysV init script service units with an LSB header referring to the "$time" facility."
What you want is probably only "After=time-sync.target"
Updated by Nathan Cutler almost 8 years ago
Before this change we had:
Wants=network-online.target local-fs.target After=network-online.target local-fs.target
After the change we have:
Wants=network-online.target local-fs.target time-sync.target After=network-online.target local-fs.target time-sync.target
If the "Wants=" is only for targets provided by us, we should not have the "Wants=" line at all. Correct?
Updated by Nathan Cutler almost 8 years ago
Hm, should have read the manpage. Thanks for pointing me to it. I will modify it so it looks like this:
Wants=network-online.target local-fs.target After=network-online.target local-fs.target time-sync.target
I tested it pretty thoroughly, though, so I wonder if the "Wants=time-sync.target" is actively harmful or just superfluous.
Updated by Fabian Grünbichler almost 8 years ago
Not sure if this is just "cosmetic", or if it might cause problems (dependency cycles?).
Updated by Loïc Dachary almost 8 years ago
- Status changed from Pending Backport to Resolved