Bug #39705: qa: Expected: (btime) < (new_btime), actual: 2019-05-09 23:33:09.400554 vs 2019-05-09 23:33:09.094205 - CephFS - Ceph

Actions

Copy link

Bug #39705

closed

qa: Expected: (btime) < (new_btime), actual: 2019-05-09 23:33:09.400554 vs 2019-05-09 23:33:09.094205

Added by Patrick Donnelly almost 5 years ago. Updated almost 5 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

David Disseldorp

Category:

Target version:

Ceph - v15.0.0

% Done:

Source:

Q/A

Tags:

Backport:

nautilus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

qa-suite

Labels (FS):

snapshots

Pull request ID:

28116

Crash signature (v1):

Crash signature (v2):

Description

2019-05-09T23:33:08.964 INFO:tasks.workunit.client.0.smithi173.stdout:[ RUN      ] LibCephFS.SnapXattrs
2019-05-09T23:33:09.014 INFO:tasks.workunit.client.0.smithi173.stdout:/build/ceph-15.0.0-1031-gc04a3cc/src/test/libcephfs/test.cc:2258: Failure
2019-05-09T23:33:09.015 INFO:tasks.workunit.client.0.smithi173.stdout:Expected: (btime) < (new_btime), actual: 2019-05-09 23:33:09.400554 vs 2019-05-09 23:33:09.094205
2019-05-09T23:33:09.016 INFO:tasks.workunit.client.0.smithi173.stdout:[  FAILED  ] LibCephFS.SnapXattrs (47 ms)
2019-05-09T23:33:09.016 INFO:tasks.workunit.client.0.smithi173.stdout:[----------] 55 tests from LibCephFS (72641 ms total)
...
2019-05-09T23:33:11.586 INFO:tasks.workunit.client.0.smithi173.stdout:[  FAILED  ] 1 test, listed below:
2019-05-09T23:33:11.586 INFO:tasks.workunit.client.0.smithi173.stdout:[  FAILED  ] LibCephFS.SnapXattrs
2019-05-09T23:33:11.586 INFO:tasks.workunit.client.0.smithi173.stdout:
2019-05-09T23:33:11.586 INFO:tasks.workunit.client.0.smithi173.stdout: 1 FAILED TEST
2019-05-09T23:33:11.594 DEBUG:teuthology.orchestra.run:got remote process result: 1
2019-05-09T23:33:11.594 INFO:tasks.workunit:Stopping ['libcephfs/test.sh'] on client.0...
2019-05-09T23:33:11.594 INFO:teuthology.orchestra.run.smithi173:Running:
2019-05-09T23:33:11.594 INFO:teuthology.orchestra.run.smithi173:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2019-05-09T23:33:11.756 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-parial-recovery-2019-05-09/qa/tasks/workunit.py", line 123, in task
    timeout=timeout,cleanup=cleanup)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-parial-recovery-2019-05-09/qa/tasks/workunit.py", line 409, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 205, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 437, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 162, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 184, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed (workunit test libcephfs/test.sh) on smithi173 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=c04a3ccdd2e25c2075c7680ed3203151b99f2333 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/libcephfs/test.sh'

From: /ceph/teuthology-archive/nojha-2019-05-09_22:58:42-fs:basic_workload-wip-parial-recovery-2019-05-09-distro-basic-smithi/3942897/teuthology.log

David, any idea?

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Patrick Donnelly almost 5 years ago

Related to Feature #38838: Expose CephFS snapshot creation time to clients added

Actions

Copy link

Updated by David Disseldorp almost 5 years ago

Patrick Donnelly wrote:

[...]

From: /ceph/teuthology-archive/nojha-2019-05-09_22:58:42-fs:basic_workload-wip-parial-recovery-2019-05-09-distro-basic-smithi/3942897/teuthology.log

David, any idea?

Thanks for the ping.

For the test we have a single client creating two sequential snapshots and confirming that the second snapshot has a btime (MDS SnapInfo.stamp) value higher than that of the first. In the failure the values are (btime=2019-05-09 23:33:09.400554) < (new_btime=2019-05-09 23:33:09.094205), so we have a small delta in the wrong direction.

Without knowing too much about how snapshots are processed by the MDS, could it be that a snapshot's SnapInfo.stamp value only gets committed after the lazy asynchronous client flush has completed? If so, out of order completion may not be unexpected for sequential snapshots.

Actions

Copy link

Updated by Patrick Donnelly almost 5 years ago

Assignee changed from David Disseldorp to Zheng Yan

Actions

Copy link

Updated by Zheng Yan almost 5 years ago


2019-05-09 23:33:09.010 7f3f71911700 15  mds.0.cache.snaprealm(0x1 seq 5 0x55d157888280)  ? snap(2 0x1 'test_snap_xattr_snap_15872' 2019-05-09 23:31:42.199137)
2019-05-09 23:33:09.010 7f3f71911700 15  mds.0.cache.snaprealm(0x1 seq 5 0x55d157888280)  ? snap(3 0x1 'test_snap_xattr_snap2_15872' 2019-05-09 23:31:42.208800)
2019-05-09 23:33:09.010 7f3f71911700 15  mds.0.cache.snaprealm(0x1 seq 5 0x55d157888280)  ? snap(4 0x1 'test_snap_xattr_snap_91533' 2019-05-09 23:33:08.990488)
2019-05-09 23:33:09.010 7f3f71911700 15  mds.0.cache.snaprealm(0x1 seq 5 0x55d157888280)  ? snap(5 0x1 'test_snap_xattr_snap2_91533' 2019-05-09 23:33:09.004205)

here is btime of snapshots, looks like the test case does not parse time stamp correctly

Actions

Copy link

Updated by Zheng Yan almost 5 years ago

Assignee changed from Zheng Yan to David Disseldorp

Actions

Copy link

Updated by David Disseldorp almost 5 years ago

This bug is due to incorrect placement of the pad/width specifier in:

11809 size_t Client::_vxattrcb_snap_btime(Inode *in, char *val, size_t size)
11810 {
11811 return snprintf(val, size, "%llu.09%lu",

The "09" string is incorrectly provided as a prefix for the nanoseconds value, instead of setting zero-padding width. The same bug is present in the
(not yet merged) snap.btime kernel patchset.

This bug is a cut-n-paste bug from:

11790 size_t Client::_vxattrcb_dir_rctime(Inode *in, char *val, size_t size)
11791 {
11792 return snprintf(val, size, "%ld.09%ld", (long)in->rstat.rctime.sec(),
11793 (long)in->rstat.rctime.nsec());

...I'll raise a separate bug for "dir.rctime", which also affects kernel.

Actions

Copy link

Updated by David Disseldorp almost 5 years ago

Copied to Bug #39943: client: ceph.dir.rctime xattr value incorrectly prefixes "09" to the nanoseconds component added

Actions

Copy link

Updated by David Disseldorp almost 5 years ago

Pull request ID set to 28116

Actions

Copy link

Updated by Patrick Donnelly almost 5 years ago

Status changed from New to Pending Backport

Actions

Copy link

#10

Updated by Nathan Cutler almost 5 years ago

Copied to Backport #40169: nautilus: qa: Expected: (btime) < (new_btime), actual: 2019-05-09 23:33:09.400554 vs 2019-05-09 23:33:09.094205 added

Actions

Copy link

#11

Updated by Nathan Cutler almost 5 years ago

Status changed from Pending Backport to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #39705

qa: Expected: (btime) < (new_btime), actual: 2019-05-09 23:33:09.400554 vs 2019-05-09 23:33:09.094205

Updated by Patrick Donnelly almost 5 years ago

Updated by David Disseldorp almost 5 years ago

Updated by Patrick Donnelly almost 5 years ago

Updated by Zheng Yan almost 5 years ago

Updated by Zheng Yan almost 5 years ago

Updated by David Disseldorp almost 5 years ago

Updated by David Disseldorp almost 5 years ago

Updated by David Disseldorp almost 5 years ago

Updated by Patrick Donnelly almost 5 years ago

Updated by Nathan Cutler almost 5 years ago

Updated by Nathan Cutler almost 5 years ago