Project

General

Profile

Actions

Fix #15419

closed

ceph-{mds,mon,osd,radosgw} systemd unit files need "wants=time-sync.target"

Added by Nathan Cutler about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
jewel
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It sometimes happens, when starting up an entire cluster at once, that a MON or OSD starts before ntp (or systemd-timesyncd or chrony) has a chance to synchronize the clock. When it happens to a MON, the cluster comes up in HEALTH_WARN due to clock skew. Joao added some code to the MON in #14175 to make the MON cluster recover from this quicker, but the quickest fix is to restart the offending MONs.

I have been spinning up clusters in Amazon Web Services (AWS) and I've found that this racing between the ntpd.service and the ceph services is not limited just to ceph-mon. If an OSD starts before the clock is synced, the cluster starts in HEALTH_WARN and all the PGs the offending OSD participates in get stuck in "Peering" state. This disappears when the OSD is restarted.

The suggested fix is to add:

Wants=time-sync.target
After=time-sync.target

to the ceph-{mds,mon,osd,radosgw} systemd unit files. This will ensure that the ntpd/chrony/systemd-timesyncd service is started before the respective Ceph daemon starts.


Related issues 1 (0 open1 closed)

Copied to devops - Backport #15606: jewel: ceph-{mds,mon,osd,radosgw} systemd unit files need "wants=time-sync.target"ResolvedNathan CutlerActions
Actions

Also available in: Atom PDF