Bug #10675
closedfailures related to clock skew
0%
Description
I suspect ntp is acting up in sepia. There are quite a few job failures related to clock skew. Clock skew could also explain why teuthology decided to use stale pycs in issue #10651
Look at the failures on this fun for an example:
"2015-01-28 08:04:30.278152 mon.0 10.214.132.17:6789/0 3 : cluster [WRN] message from mon.1 was stamped 1.988812s in the future, clocks not synchronized" in cluster log
Updated by Zack Cerza about 9 years ago
- Project changed from teuthology to sepia
Updated by Zack Cerza about 9 years ago
I guess DreamHost's ntp server went down. Sage's fix is here:
https://github.com/ceph/ceph-qa-chef/pull/7
I'm still nervous that we're using a single server, though...
Updated by Yuri Weinstein about 9 years ago
Still see in runs.
http://pulpito.front.sepia.ceph.com/teuthology-2015-02-06_11:06:02-upgrade:giant-x:parallel-hammer-distro-basic-vps/742324/
"2015-02-06 19:26:01.125925 mon.0 10.214.130.124:6789/0 8 : cluster [WRN] mon.1 10.214.130.171:6789/0 clock skew 0.817504s > max 0.5s" in cluster log
Updated by Yuri Weinstein about 9 years ago
One more:
Run- http://pulpito.ceph.com/teuthology-2015-02-06_11:06:02-upgrade:giant-x:parallel-hammer-distro-basic-vps/
Job - 742324
"2015-02-06 19:26:01.125925 mon.0 10.214.130.124:6789/0 8 : cluster [WRN] mon.1 10.214.130.171:6789/0 clock skew 0.817504s > max 0.5s" in cluster log
Updated by David Zafman about 9 years ago
Seen again
dzafman-2015-02-19_14:55:37-rados:thrash-wip-10883---basic-multi/771074
On burnupi19 and burnupi25:
2015-02-20 12:52:52.636017 mon.1 10.214.134.14:6789/0 177 : cluster
[WRN] message from mon.0 was stamped 0.501458s in the future, clocks not
synchronized
dzafman-2015-02-19_14:55:37-rados:thrash-wip-10883---basic-multi/770916
On plana62 and plana64:
2015-02-20 10:00:56.842533 mon.0 10.214.132.14:6789/0 3 : cluster [WRN]
message from mon.1 was stamped 0.855106s in the future, clocks not
synchronized
Updated by Yuri Weinstein about 9 years ago
- Priority changed from Normal to Urgent
This issue comes up in many places, so increasing priority
Updated by Zack Cerza about 8 years ago
- Status changed from New to Resolved
We are using more reliable NTP servers now and believe this to be resolved
Updated by Nathan Cutler over 7 years ago
Updated by David Galloway over 7 years ago
- Category set to Infrastructure Service
- Status changed from Resolved to In Progress
- Assignee set to David Galloway
Found this in the latest example:
2016-11-19T16:57:44.391 INFO:teuthology.orchestra.run.smithi116.stderr:19 Nov 16:57:44 ntpdate[7837]: 45.79.10.228 rate limit response from server.
I know we're due for an NTP server in the community cage. It may be a service I need to set up before then though.
Updated by David Galloway about 6 years ago
- Status changed from In Progress to Resolved