Bug #14175
closed
clock skew report is incorrect by "ceph health detail" command
Added by wei qiaomiao over 8 years ago.
Updated almost 8 years ago.
Description
I usr "ceph health detail" command to check my cluster health and found below warning:
mon.c7 addr 10.118.202.97:6789/0 clock skew 239.478s > max 0.05s (latency 0.0355416s)
so i modify mon.c7 system time to make it the same as the leader monitor, but the warning is still exist:
mon.c7 addr 10.118.202.97:6789/0 clock skew 191.582s > max 0.05s (latency 0.0286543s)3s)
- Assignee set to Joao Eduardo Luis
Hopefully Joao will chime in with a deeper explanation, but until then I can say that I have run into a similar issue (you don't mention which Ceph version you are using - I was using 0.94.3).
Here is what I remember of Joao's explanation:
As of Hammer there is new clock-skew handling logic in the monitor code that is designed to make the cluster more tolerant of time discrepancies (situations where the clock on one monitor node is slightly ahead of, or behind, the other nodes). This, however, comes with an unintended side effect: it now takes longer for clusters to recover from large time differences.
Whether or not this is a bug is still an open question.
It would be interesting to know how long it takes the cluster to recover from clock skew reported here. In my case, the time discrepancy was arising at boot time and the cluster took 15-60 minutes to recover (i.e. for the "clock skew" warning to disappear).
I was using 0.94.5 version
How long it take the cluster recover fron clock skew reported depend on how large time of the clock drift. In my
environment,the cluster took 3-4 hours to recover when clock drift is 2-3 minutes. It‘s too long time for user。
May be we can improve the clock-skew handling mechanism for the scene of cluster’s clock drift is large, for example,
when the absolute of current clock skew value minus the last value is larger than 5s(or other value we can discuss), we drop the last value and only took the current value to report
- Category set to Monitor
- Status changed from New to Fix Under Review
- Status changed from Fix Under Review to Pending Backport
Joao, shall we backport this change to hammer?
- Copied to Backport #15024: hammer: clock skew report is incorrect by "ceph health detail" command added
- Status changed from Pending Backport to Resolved
Also available in: Atom
PDF