Project

General

Profile

Actions

Bug #43513

closed

qa: filelock_interrupt.py hang

Added by Patrick Donnelly over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

...2020-01-07T19:46:53.979 INFO:teuthology.orchestra.run.smithi029:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-07T19:47:24.041 INFO:teuthology.orchestra.run.smithi027:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-07T19:47:24.044 INFO:teuthology.orchestra.run.smithi029:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-07T19:47:54.113 INFO:teuthology.orchestra.run.smithi027:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-07T19:47:54.116 INFO:teuthology.orchestra.run.smithi029:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-07T19:48:24.185 INFO:teuthology.orchestra.run.smithi027:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-07T19:48:24.206 INFO:teuthology.orchestra.run.smithi029:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-07T19:48:32.486 DEBUG:teuthology.orchestra.run:got remote process result: 124
2020-01-07T19:48:32.486 INFO:tasks.workunit:Stopping ['fs/misc'] on client.0...
2020-01-07T19:48:32.486 INFO:teuthology.orchestra.run.smithi027:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2020-01-07T19:48:32.690 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200106.232504/qa/tasks/workunit.py", line 138, in task
    cleanup=cleanup)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200106.232504/qa/tasks/workunit.py", line 288, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 87, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 101, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 37, in resurrect_traceback
    reraise(*exc_info)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 24, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200106.232504/qa/tasks/workunit.py", line 411, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 198, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 433, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 158, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 180, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed (workunit test fs/misc/filelock_interrupt.py) on smithi027 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=ab553e090c24c56e06ca0e79dbeffb6473c8fefd TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/misc/filelock_interrupt.py'

From: /ceph/teuthology-archive/pdonnell-2020-01-07_16:10:56-fs-wip-pdonnell-testing-20200106.232504-distro-basic-smithi/4644574/teuthology.log

Actions #1

Updated by Zheng Yan over 4 years ago

Looks like flock syscall was restarted after handling signal alarm. The script does not work with python3, but work with python2.

Adding signal.siginterrupt(signal.SIGALRM, True) after signal.signal(signal.SIGALRM, handler) does not work with python3 either. This looks like python3 bug.

Actions #2

Updated by Patrick Donnelly over 4 years ago

Zheng Yan wrote:

Looks like flock syscall was restarted after handling signal alarm. The script does not work with python3, but work with python2.

Adding signal.siginterrupt(signal.SIGALRM, True) after signal.signal(signal.SIGALRM, handler) does not work with python3 either. This looks like python3 bug.

Can you convert it into a C program instead? Signal handling in any scripting language is always dicey anyway.

Actions #3

Updated by Zheng Yan over 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 32741
Actions #4

Updated by Patrick Donnelly over 4 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF