Project

General

Profile

Actions

Bug #854

closed

unsynchronized clocks between kernel-client/cmds cause PJD fstest failures

Added by Brian Chrisman about 13 years ago. Updated about 10 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm seeing a varied number (generally 5-8) of POSIX tests within the PJD fstest suite failing when the tests are being run on a node (atop the ceph kernel client) where that node's clock is not synchronized with the node hosting the active MDS.
Synchronizing the clocks in the cluster using ntpdate/xntpd returns PJD fstests to full success.

This is not likely critical because:
- failures are small corner cases (unexpected ctimes during operations like lchown of a symlink, for example)
- workaround of having clocks set correctly is reasonable
- may be a known design issue (mds creating a timestamp instead of taking the client's time stamp?) for some reason

Here's a histogram of the test failures (out of 21 repeat runs)
(count) (test number) (status) (line number) (test filename)

19 102:fail:[218 /opt/scale/lib/pjdfstests/tests/chown/00.t]
21 112:fail:[236 /opt/scale/lib/pjdfstests/tests/chown/00.t]
13 141:fail:[287 /opt/scale/lib/pjdfstests/tests/chown/00.t]
17 145:fail:[302 /opt/scale/lib/pjdfstests/tests/chown/00.t]
21 153:fail:[332 /opt/scale/lib/pjdfstests/tests/chown/00.t]
13 27:fail:[70 /opt/scale/lib/pjdfstests/tests/chmod/00.t]
18 31:fail:[78 /opt/scale/lib/pjdfstests/tests/chmod/00.t]
11 97:fail:[209 /opt/scale/lib/pjdfstests/tests/chown/00.t]

Different tests will fail in different runs... a few (with 21/21) fail consistently.

Actions

Also available in: Atom PDF