Project

General

Profile

Actions

Bug #50281

closed

qa: untar_snap_rm timeout

Added by Patrick Donnelly about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
qa, qa-failure, snapshots
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-04-09T08:46:37.116 INFO:tasks.ceph.mds.c.smithi035.stderr:2021-04-09T08:46:37.111+0000 7eff48f48700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2021-04-09T07:46:37.114769+0000)
2021-04-09T08:46:37.253 INFO:tasks.ceph.mds.a.smithi035.stderr:2021-04-09T08:46:37.251+0000 7f9488bec700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2021-04-09T07:46:37.251787+0000)
2021-04-09T08:46:37.257 INFO:tasks.ceph.osd.4.smithi172.stderr:2021-04-09T08:46:37.251+0000 7f14d39e0700 -1 osd.4 103 get_health_metrics reporting 6 slow ops, oldest is osd_op(mds.0.4:171295 3.3 3.f2b68aab (undecoded) ondisk+retry+write+known_if_redirected+full_force e79)
2021-04-09T08:46:37.753 INFO:tasks.ceph.osd.6.smithi172.stderr:2021-04-09T08:46:37.747+0000 7f5ed2972700 -1 osd.6 103 get_health_metrics reporting 114 slow ops, oldest is osd_op(mds.0.4:171360 4.5 4.17c5e6c5 (undecoded) ondisk+write+known_if_redirected+full_force e79)
2021-04-09T08:46:38.116 INFO:tasks.ceph.mds.c.smithi035.stderr:2021-04-09T08:46:38.111+0000 7eff48f48700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2021-04-09T07:46:38.114959+0000)
2021-04-09T08:46:38.254 INFO:tasks.ceph.mds.a.smithi035.stderr:2021-04-09T08:46:38.251+0000 7f9488bec700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2021-04-09T07:46:38.252015+0000)
2021-04-09T08:46:38.281 INFO:tasks.ceph.osd.4.smithi172.stderr:2021-04-09T08:46:38.275+0000 7f14d39e0700 -1 osd.4 103 get_health_metrics reporting 6 slow ops, oldest is osd_op(mds.0.4:171295 3.3 3.f2b68aab (undecoded) ondisk+retry+write+known_if_redirected+full_force e79)
2021-04-09T08:46:38.687 DEBUG:teuthology.orchestra.run:got remote process result: 124
2021-04-09T08:46:38.740 INFO:tasks.ceph.osd.6.smithi172.stderr:2021-04-09T08:46:38.735+0000 7f5ed2972700 -1 osd.6 103 get_health_metrics reporting 114 slow ops, oldest is osd_op(mds.0.4:171360 4.5 4.17c5e6c5 (undecoded) ondisk+write+known_if_redirected+full_force e79)
2021-04-09T08:46:39.127 INFO:tasks.ceph.mds.c.smithi035.stderr:2021-04-09T08:46:39.123+0000 7eff48f48700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2021-04-09T07:46:39.125356+0000)
2021-04-09T08:46:39.213 INFO:tasks.workunit.client.0.smithi035.stdout:'.snap/k/linux
2021-04-09T08:46:39.214 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...
2021-04-09T08:46:39.215 DEBUG:teuthology.orchestra.run.smithi035:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2021-04-09T08:46:39.254 INFO:tasks.ceph.mds.a.smithi035.stderr:2021-04-09T08:46:39.247+0000 7f9488bec700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2021-04-09T07:46:39.252176+0000)
2021-04-09T08:46:39.327 INFO:tasks.ceph.osd.4.smithi172.stderr:2021-04-09T08:46:39.323+0000 7f14d39e0700 -1 osd.4 103 get_health_metrics reporting 6 slow ops, oldest is osd_op(mds.0.4:171295 3.3 3.f2b68aab (undecoded) ondisk+retry+write+known_if_redirected+full_force e79)
2021-04-09T08:46:39.465 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/run_tasks.py", line 91, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/run_tasks.py", line 70, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_48c4592f88f808feaabf260b70177e37c037a379/qa/tasks/workunit.py", line 147, in task
    cleanup=cleanup)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_48c4592f88f808feaabf260b70177e37c037a379/qa/tasks/workunit.py", line 297, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_48c4592f88f808feaabf260b70177e37c037a379/qa/tasks/workunit.py", line 425, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/orchestra/remote.py", line 215, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed (workunit test fs/snaps/untar_snap_rm.sh) on smithi035 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=48c4592f88f808feaabf260b70177e37c037a379 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/untar_snap_rm.sh'

From: /ceph/teuthology-archive/pdonnell-2021-04-08_22:42:24-fs-wip-pdonnell-testing-20210408.192301-distro-basic-smithi/6030440/teuthology.log

Cursory glance indicates it's just caused by a storm of slow ops. Not sure why this is new though.

Testing kernel client might be related.


Related issues 1 (0 open1 closed)

Related to CephFS - Bug #50826: kceph: stock RHEL kernel hangs on snaptests with mon|osd thrashersClosedPatrick Donnelly

Actions
Actions

Also available in: Atom PDF