Bug #854
closedunsynchronized clocks between kernel-client/cmds cause PJD fstest failures
0%
Description
I'm seeing a varied number (generally 5-8) of POSIX tests within the PJD fstest suite failing when the tests are being run on a node (atop the ceph kernel client) where that node's clock is not synchronized with the node hosting the active MDS.
Synchronizing the clocks in the cluster using ntpdate/xntpd returns PJD fstests to full success.
This is not likely critical because:
- failures are small corner cases (unexpected ctimes during operations like lchown of a symlink, for example)
- workaround of having clocks set correctly is reasonable
- may be a known design issue (mds creating a timestamp instead of taking the client's time stamp?) for some reason
Here's a histogram of the test failures (out of 21 repeat runs)
(count) (test number) (status) (line number) (test filename)
19 102:fail:[218 /opt/scale/lib/pjdfstests/tests/chown/00.t]
21 112:fail:[236 /opt/scale/lib/pjdfstests/tests/chown/00.t]
13 141:fail:[287 /opt/scale/lib/pjdfstests/tests/chown/00.t]
17 145:fail:[302 /opt/scale/lib/pjdfstests/tests/chown/00.t]
21 153:fail:[332 /opt/scale/lib/pjdfstests/tests/chown/00.t]
13 27:fail:[70 /opt/scale/lib/pjdfstests/tests/chmod/00.t]
18 31:fail:[78 /opt/scale/lib/pjdfstests/tests/chmod/00.t]
11 97:fail:[209 /opt/scale/lib/pjdfstests/tests/chown/00.t]
Different tests will fail in different runs... a few (with 21/21) fail consistently.
Updated by Greg Farnum about 13 years ago
Ah, that makes sense. This is something we're unlikely to fix -- currently a lot of operations occur "on" the MDS (renames, creates, etc) and so sending a client time along for those wouldn't make much sense. But we need to refer to both the kernel client's time and the MDS' time since other operations (like inode data changes) occur "on" the kernel client and are reported to the MDS in batches.
But maybe some brilliant idea will come up in the future!
Updated by Sage Weil about 13 years ago
The only reasonably sane idea I have here is for the client/mds to compare clocks to estimate skew and have some sort of auto-adjustment going on. It's hard to say what that adjustment should be, though. Maybe just periodically spamming the console when the skew is significant is the thing to do.
Updated by Greg Farnum about 10 years ago
- Status changed from New to Duplicate
I'm closing this in favor of fix ticket #7564.