Bug #39705
qa: Expected: (btime) < (new_btime), actual: 2019-05-09 23:33:09.400554 vs 2019-05-09 23:33:09.094205
0%
Description
2019-05-09T23:33:08.964 INFO:tasks.workunit.client.0.smithi173.stdout:[ RUN ] LibCephFS.SnapXattrs 2019-05-09T23:33:09.014 INFO:tasks.workunit.client.0.smithi173.stdout:/build/ceph-15.0.0-1031-gc04a3cc/src/test/libcephfs/test.cc:2258: Failure 2019-05-09T23:33:09.015 INFO:tasks.workunit.client.0.smithi173.stdout:Expected: (btime) < (new_btime), actual: 2019-05-09 23:33:09.400554 vs 2019-05-09 23:33:09.094205 2019-05-09T23:33:09.016 INFO:tasks.workunit.client.0.smithi173.stdout:[ FAILED ] LibCephFS.SnapXattrs (47 ms) 2019-05-09T23:33:09.016 INFO:tasks.workunit.client.0.smithi173.stdout:[----------] 55 tests from LibCephFS (72641 ms total) ... 2019-05-09T23:33:11.586 INFO:tasks.workunit.client.0.smithi173.stdout:[ FAILED ] 1 test, listed below: 2019-05-09T23:33:11.586 INFO:tasks.workunit.client.0.smithi173.stdout:[ FAILED ] LibCephFS.SnapXattrs 2019-05-09T23:33:11.586 INFO:tasks.workunit.client.0.smithi173.stdout: 2019-05-09T23:33:11.586 INFO:tasks.workunit.client.0.smithi173.stdout: 1 FAILED TEST 2019-05-09T23:33:11.594 DEBUG:teuthology.orchestra.run:got remote process result: 1 2019-05-09T23:33:11.594 INFO:tasks.workunit:Stopping ['libcephfs/test.sh'] on client.0... 2019-05-09T23:33:11.594 INFO:teuthology.orchestra.run.smithi173:Running: 2019-05-09T23:33:11.594 INFO:teuthology.orchestra.run.smithi173:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2019-05-09T23:33:11.756 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task return task(**kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-parial-recovery-2019-05-09/qa/tasks/workunit.py", line 123, in task timeout=timeout,cleanup=cleanup) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-parial-recovery-2019-05-09/qa/tasks/workunit.py", line 409, in _run_tests label="workunit test {workunit}".format(workunit=workunit) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 205, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 437, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 162, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 184, in _raise_for_status node=self.hostname, label=self.label CommandFailedError: Command failed (workunit test libcephfs/test.sh) on smithi173 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=c04a3ccdd2e25c2075c7680ed3203151b99f2333 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/libcephfs/test.sh'
From: /ceph/teuthology-archive/nojha-2019-05-09_22:58:42-fs:basic_workload-wip-parial-recovery-2019-05-09-distro-basic-smithi/3942897/teuthology.log
David, any idea?
Related issues
History
#1 Updated by Patrick Donnelly almost 5 years ago
- Related to Feature #38838: Expose CephFS snapshot creation time to clients added
#2 Updated by David Disseldorp almost 5 years ago
Patrick Donnelly wrote:
[...]
From: /ceph/teuthology-archive/nojha-2019-05-09_22:58:42-fs:basic_workload-wip-parial-recovery-2019-05-09-distro-basic-smithi/3942897/teuthology.log
David, any idea?
Thanks for the ping.
For the test we have a single client creating two sequential snapshots and confirming that the second snapshot has a btime (MDS SnapInfo.stamp) value higher than that of the first. In the failure the values are (btime=2019-05-09 23:33:09.400554) < (new_btime=2019-05-09 23:33:09.094205), so we have a small delta in the wrong direction.
Without knowing too much about how snapshots are processed by the MDS, could it be that a snapshot's SnapInfo.stamp value only gets committed after the lazy asynchronous client flush has completed? If so, out of order completion may not be unexpected for sequential snapshots.
#3 Updated by Patrick Donnelly almost 5 years ago
- Assignee changed from David Disseldorp to Zheng Yan
#4 Updated by Zheng Yan almost 5 years ago
2019-05-09 23:33:09.010 7f3f71911700 15 mds.0.cache.snaprealm(0x1 seq 5 0x55d157888280) ? snap(2 0x1 'test_snap_xattr_snap_15872' 2019-05-09 23:31:42.199137) 2019-05-09 23:33:09.010 7f3f71911700 15 mds.0.cache.snaprealm(0x1 seq 5 0x55d157888280) ? snap(3 0x1 'test_snap_xattr_snap2_15872' 2019-05-09 23:31:42.208800) 2019-05-09 23:33:09.010 7f3f71911700 15 mds.0.cache.snaprealm(0x1 seq 5 0x55d157888280) ? snap(4 0x1 'test_snap_xattr_snap_91533' 2019-05-09 23:33:08.990488) 2019-05-09 23:33:09.010 7f3f71911700 15 mds.0.cache.snaprealm(0x1 seq 5 0x55d157888280) ? snap(5 0x1 'test_snap_xattr_snap2_91533' 2019-05-09 23:33:09.004205)
here is btime of snapshots, looks like the test case does not parse time stamp correctly
#5 Updated by Zheng Yan almost 5 years ago
- Assignee changed from Zheng Yan to David Disseldorp
#6 Updated by David Disseldorp almost 5 years ago
This bug is due to incorrect placement of the pad/width specifier in:
11809 size_t Client::_vxattrcb_snap_btime(Inode *in, char *val, size_t size)
11810 {
11811 return snprintf(val, size, "%llu.09%lu",
The "09" string is incorrectly provided as a prefix for the nanoseconds value, instead of setting zero-padding width. The same bug is present in the
(not yet merged) snap.btime kernel patchset.
This bug is a cut-n-paste bug from:
11790 size_t Client::_vxattrcb_dir_rctime(Inode *in, char *val, size_t size)
11791 {
11792 return snprintf(val, size, "%ld.09%ld", (long)in->rstat.rctime.sec(),
11793 (long)in->rstat.rctime.nsec());
...I'll raise a separate bug for "dir.rctime", which also affects kernel.
#7 Updated by David Disseldorp almost 5 years ago
- Copied to Bug #39943: client: ceph.dir.rctime xattr value incorrectly prefixes "09" to the nanoseconds component added
#8 Updated by David Disseldorp almost 5 years ago
- Pull request ID set to 28116
#9 Updated by Patrick Donnelly almost 5 years ago
- Status changed from New to Pending Backport
#10 Updated by Nathan Cutler almost 5 years ago
- Copied to Backport #40169: nautilus: qa: Expected: (btime) < (new_btime), actual: 2019-05-09 23:33:09.400554 vs 2019-05-09 23:33:09.094205 added
#11 Updated by Nathan Cutler almost 5 years ago
- Status changed from Pending Backport to Resolved