Actions
Bug #36365
closedqa: increase rm timeout for workunit cleanup
% Done:
0%
Source:
Q/A
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Some workunits like fsstress take ~45 minutes to cleanup:
Failure: 5 jobs: ['3116578', '3116526', '3116682', '3116631', '3116735'] suites intersection: ['clusters/fixed-2-ucephfs.yaml', 'conf/{client.yaml', 'fs/verify/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}'] suites union: ['clusters/fixed-2-ucephfs.yaml', 'conf/{client.yaml', 'fs/verify/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/bluestore.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
and
2018-10-08T23:27:29.024 INFO:tasks.workunit.client.0.smithi131.stdout:2/999: creat d2/d84/db0/db9/dcc/f159 x:0 0 0 2018-10-08T23:27:29.051 INFO:tasks.workunit.client.0.smithi131.stderr:+ rm -rf -- ./tmp.gvZNSLTmJM 2018-10-08T23:36:04.837 INFO:teuthology.orchestra.run:Running command with timeout 900 2018-10-08T23:36:04.837 INFO:teuthology.orchestra.run.smithi131:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp' 2018-10-08T23:51:04.844 ERROR:teuthology:Uncaught exception (Hub) Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run result = self._run(*self.args, **self.kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 300, in copy_file_to copy_to_log(src, logger) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 267, in copy_to_log for line in f: File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 102, in next line = self.readline() File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 277, in readline new_data = self._read(n) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 1305, in _read return self.channel.recv_stderr(size) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 715, in recv_stderr raise socket.timeout() timeout 2018-10-08T23:51:04.854 ERROR:teuthology:Uncaught exception (Hub) Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run result = self._run(*self.args, **self.kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 300, in copy_file_to copy_to_log(src, logger) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 267, in copy_to_log for line in f: File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 102, in next line = self.readline() File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 277, in readline new_data = self._read(n) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 1293, in _read return self.channel.recv(size) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 667, in recv raise socket.timeout() timeout 2018-10-09T00:15:52.605 INFO:tasks.workunit:Stopping ['suites/fsstress.sh'] on client.0... 2018-10-09T00:15:52.605 INFO:teuthology.orchestra.run.smithi131:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0' 2018-10-09T00:15:52.766 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): H File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/run_tasks.py", line 86, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/run_tasks.py", line 65, in run_one_task return task(**kwargs) File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 206, in task cleanup=cleanup) File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 356, in _spawn_on_all_clients timeout=timeout) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 85, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 99, in next resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 22, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 483, in _run_tests remote.run(logger=log.getChild(role), args=args, timeout=(15*60)) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/remote.py", line 193, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 429, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 148, in wait greenlet.get(block=True,timeout=60) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 456, in get self._raise_exception() File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 159, in _raise_exception reraise(*self.exc_info) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 536, in run result = self._run(*self.args, **self.kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 300, in copy_file_to copy_to_log(src, logger) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 267, in copy_to_log for line in f: File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 102, in next line = self.readline() File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 277, in readline new_data = self._read(n) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 1305, in _read return self.channel.recv_stderr(size) File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 715, in recv_stderr raise socket.timeout() timeout
(with logrotate messages removed.) From: /ceph/teuthology-archive/pdonnell-2018-10-08_20:32:32-fs-luminous-distro-basic-smithi/3116526/teuthology.log
I don't know why the timeout didn't cause rm to not be killed but it did work for the stdout/stderr greenlets. Weird.
Updated by Patrick Donnelly over 5 years ago
- Related to Bug #36184: qa: add timeouts to workunits to bound test execution time in the event of crashes/bugs added
Updated by Patrick Donnelly over 5 years ago
- Status changed from In Progress to Fix Under Review
Updated by Patrick Donnelly over 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #36501: mimic: qa: increase rm timeout for workunit cleanup added
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #36502: luminous: qa: increase rm timeout for workunit cleanup added
Updated by Patrick Donnelly over 5 years ago
- Status changed from Pending Backport to Resolved
Actions