Bug #14175
closedclock skew report is incorrect by "ceph health detail" command
0%
Description
I usr "ceph health detail" command to check my cluster health and found below warning:
mon.c7 addr 10.118.202.97:6789/0 clock skew 239.478s > max 0.05s (latency 0.0355416s)
so i modify mon.c7 system time to make it the same as the leader monitor, but the warning is still exist:
mon.c7 addr 10.118.202.97:6789/0 clock skew 191.582s > max 0.05s (latency 0.0286543s)3s)
Updated by Nathan Cutler over 8 years ago
Hopefully Joao will chime in with a deeper explanation, but until then I can say that I have run into a similar issue (you don't mention which Ceph version you are using - I was using 0.94.3).
Here is what I remember of Joao's explanation:
As of Hammer there is new clock-skew handling logic in the monitor code that is designed to make the cluster more tolerant of time discrepancies (situations where the clock on one monitor node is slightly ahead of, or behind, the other nodes). This, however, comes with an unintended side effect: it now takes longer for clusters to recover from large time differences.
Whether or not this is a bug is still an open question.
It would be interesting to know how long it takes the cluster to recover from clock skew reported here. In my case, the time discrepancy was arising at boot time and the cluster took 15-60 minutes to recover (i.e. for the "clock skew" warning to disappear).
Updated by wei qiaomiao over 8 years ago
I was using 0.94.5 version
How long it take the cluster recover fron clock skew reported depend on how large time of the clock drift. In my
environment,the cluster took 3-4 hours to recover when clock drift is 2-3 minutes. It‘s too long time for user。
May be we can improve the clock-skew handling mechanism for the scene of cluster’s clock drift is large, for example,
when the absolute of current clock skew value minus the last value is larger than 5s(or other value we can discuss), we drop the last value and only took the current value to report
Updated by Joao Eduardo Luis over 8 years ago
- Category set to Monitor
- Status changed from New to Fix Under Review
Updated by Kefu Chai about 8 years ago
- Status changed from Fix Under Review to Pending Backport
Joao, shall we backport this change to hammer?
Updated by Nathan Cutler about 8 years ago
- Copied to Backport #15024: hammer: clock skew report is incorrect by "ceph health detail" command added
Updated by Loïc Dachary almost 8 years ago
- Status changed from Pending Backport to Resolved