Project

General

Profile

Actions

Bug #36365

closed

qa: increase rm timeout for workunit cleanup

Added by Patrick Donnelly over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Some workunits like fsstress take ~45 minutes to cleanup:

Failure:
5 jobs: ['3116578', '3116526', '3116682', '3116631', '3116735']
suites intersection: ['clusters/fixed-2-ucephfs.yaml', 'conf/{client.yaml', 'fs/verify/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
suites union: ['clusters/fixed-2-ucephfs.yaml', 'conf/{client.yaml', 'fs/verify/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/bluestore.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']

and

2018-10-08T23:27:29.024 INFO:tasks.workunit.client.0.smithi131.stdout:2/999: creat d2/d84/db0/db9/dcc/f159 x:0 0 0
2018-10-08T23:27:29.051 INFO:tasks.workunit.client.0.smithi131.stderr:+ rm -rf -- ./tmp.gvZNSLTmJM
2018-10-08T23:36:04.837 INFO:teuthology.orchestra.run:Running command with timeout 900
2018-10-08T23:36:04.837 INFO:teuthology.orchestra.run.smithi131:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp'
2018-10-08T23:51:04.844 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run
    result = self._run(*self.args, **self.kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 300, in copy_file_to
    copy_to_log(src, logger)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 267, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 102, in next
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 277, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 1305, in _read
    return self.channel.recv_stderr(size)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 715, in recv_stderr
    raise socket.timeout()
timeout
2018-10-08T23:51:04.854 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run
    result = self._run(*self.args, **self.kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 300, in copy_file_to
    copy_to_log(src, logger)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 267, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 102, in next
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 277, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 1293, in _read
    return self.channel.recv(size)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 667, in recv
    raise socket.timeout()
timeout
2018-10-09T00:15:52.605 INFO:tasks.workunit:Stopping ['suites/fsstress.sh'] on client.0...
2018-10-09T00:15:52.605 INFO:teuthology.orchestra.run.smithi131:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0'
2018-10-09T00:15:52.766 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
H
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 206, in task
    cleanup=cleanup)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 356, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 483, in _run_tests
    remote.run(logger=log.getChild(role), args=args, timeout=(15*60))
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/remote.py", line 193, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 429, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 148, in wait
    greenlet.get(block=True,timeout=60)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 456, in get
    self._raise_exception()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 159, in _raise_exception
    reraise(*self.exc_info)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run
    result = self._run(*self.args, **self.kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 300, in copy_file_to
    copy_to_log(src, logger)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 267, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 102, in next
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 277, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 1305, in _read
    return self.channel.recv_stderr(size)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 715, in recv_stderr
    raise socket.timeout()
timeout

(with logrotate messages removed.) From: /ceph/teuthology-archive/pdonnell-2018-10-08_20:32:32-fs-luminous-distro-basic-smithi/3116526/teuthology.log

I don't know why the timeout didn't cause rm to not be killed but it did work for the stdout/stderr greenlets. Weird.


Related issues 3 (0 open3 closed)

Related to CephFS - Bug #36184: qa: add timeouts to workunits to bound test execution time in the event of crashes/bugsResolvedPatrick Donnelly09/25/2018

Actions
Copied to CephFS - Backport #36501: mimic: qa: increase rm timeout for workunit cleanupResolvedNathan CutlerActions
Copied to CephFS - Backport #36502: luminous: qa: increase rm timeout for workunit cleanupResolvedJos CollinActions
Actions #1

Updated by Patrick Donnelly over 5 years ago

  • Related to Bug #36184: qa: add timeouts to workunits to bound test execution time in the event of crashes/bugs added
Actions #2

Updated by Patrick Donnelly over 5 years ago

  • Status changed from In Progress to Fix Under Review
Actions #3

Updated by Patrick Donnelly over 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36501: mimic: qa: increase rm timeout for workunit cleanup added
Actions #5

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36502: luminous: qa: increase rm timeout for workunit cleanup added
Actions #6

Updated by Patrick Donnelly over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF