Bug #10675: failures related to clock skew - sepia - Ceph

Actions

Copy link

Bug #10675

closed

failures related to clock skew

Added by Andrew Schoen about 9 years ago. Updated about 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

David Galloway

Category:

Infrastructure Service

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

I suspect ntp is acting up in sepia. There are quite a few job failures related to clock skew. Clock skew could also explain why teuthology decided to use stale pycs in issue #10651

Look at the failures on this fun for an example:

http://pulpito.front.sepia.ceph.com/sage-2015-01-28_07:07:27-rados-wip-sage-testing-distro-basic-multi/

"2015-01-28 08:04:30.278152 mon.0 10.214.132.17:6789/0 3 : cluster [WRN] message from mon.1 was stamped 1.988812s in the future, clocks not synchronized" in cluster log

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Zack Cerza about 9 years ago

Project changed from teuthology to sepia

Actions

Copy link

Updated by Zack Cerza about 9 years ago

I guess DreamHost's ntp server went down. Sage's fix is here:
https://github.com/ceph/ceph-qa-chef/pull/7

I'm still nervous that we're using a single server, though...

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

Still see in runs.
http://pulpito.front.sepia.ceph.com/teuthology-2015-02-06_11:06:02-upgrade:giant-x:parallel-hammer-distro-basic-vps/742324/

"2015-02-06 19:26:01.125925 mon.0 10.214.130.124:6789/0 8 : cluster [WRN] mon.1 10.214.130.171:6789/0 clock skew 0.817504s > max 0.5s" in cluster log

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

One more:
Run- http://pulpito.ceph.com/teuthology-2015-02-06_11:06:02-upgrade:giant-x:parallel-hammer-distro-basic-vps/
Job - 742324

"2015-02-06 19:26:01.125925 mon.0 10.214.130.124:6789/0 8 : cluster [WRN] mon.1 10.214.130.171:6789/0 clock skew 0.817504s > max 0.5s" in cluster log

Actions

Copy link

Updated by David Zafman about 9 years ago

Seen again

dzafman-2015-02-19_14:55:37-rados:thrash-wip-10883---basic-multi/771074
On burnupi19 and burnupi25:
2015-02-20 12:52:52.636017 mon.1 10.214.134.14:6789/0 177 : cluster
[WRN] message from mon.0 was stamped 0.501458s in the future, clocks not
synchronized

dzafman-2015-02-19_14:55:37-rados:thrash-wip-10883---basic-multi/770916
On plana62 and plana64:
2015-02-20 10:00:56.842533 mon.0 10.214.132.14:6789/0 3 : cluster [WRN]
message from mon.1 was stamped 0.855106s in the future, clocks not
synchronized

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

Priority changed from Normal to Urgent

This issue comes up in many places, so increasing priority

Actions

Copy link

Updated by Zack Cerza about 9 years ago

http://pulpito.ceph.com/teuthology-2015-04-08_17:05:01-upgrade:giant-x-hammer-distro-basic-vps/841280

Actions

Copy link

Updated by Sage Weil almost 9 years ago

Priority changed from Urgent to Normal

Actions

Copy link

Updated by Zack Cerza about 8 years ago

Status changed from New to Resolved

We are using more reliable NTP servers now and believe this to be resolved

Actions

Copy link

#10

Updated by Nathan Cutler over 7 years ago

Seen again in http://pulpito.front.sepia.ceph.com/smithfarm-2016-11-18_09:59:00-fs-hammer-backports---basic-smithi/558577/

Actions

Copy link

#11

Updated by David Galloway over 7 years ago

Category set to Infrastructure Service
Status changed from Resolved to In Progress
Assignee set to David Galloway

Found this in the latest example:

2016-11-19T16:57:44.391 INFO:teuthology.orchestra.run.smithi116.stderr:19 Nov 16:57:44 ntpdate[7837]: 45.79.10.228 rate limit response from server.

I know we're due for an NTP server in the community cage. It may be a service I need to set up before then though.

Actions

Copy link

#12

Updated by David Galloway about 6 years ago

Status changed from In Progress to Resolved

https://github.com/ceph/teuthology/pull/1146
https://github.com/ceph/ceph-sepia-secrets/pull/295

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Infrastructure » sepia

Custom queries

Bug #10675

failures related to clock skew

Updated by Zack Cerza about 9 years ago

Updated by Zack Cerza about 9 years ago

Updated by Yuri Weinstein about 9 years ago

Updated by Yuri Weinstein about 9 years ago

Updated by David Zafman about 9 years ago

Updated by Yuri Weinstein about 9 years ago

Updated by Zack Cerza about 9 years ago

Updated by Sage Weil almost 9 years ago

Updated by Zack Cerza about 8 years ago

Updated by Nathan Cutler over 7 years ago

Updated by David Galloway over 7 years ago

Updated by David Galloway about 6 years ago