Bug #43513
closedqa: filelock_interrupt.py hang
0%
Description
...2020-01-07T19:46:53.979 INFO:teuthology.orchestra.run.smithi029:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-01-07T19:47:24.041 INFO:teuthology.orchestra.run.smithi027:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-01-07T19:47:24.044 INFO:teuthology.orchestra.run.smithi029:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-01-07T19:47:54.113 INFO:teuthology.orchestra.run.smithi027:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-01-07T19:47:54.116 INFO:teuthology.orchestra.run.smithi029:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-01-07T19:48:24.185 INFO:teuthology.orchestra.run.smithi027:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-01-07T19:48:24.206 INFO:teuthology.orchestra.run.smithi029:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-01-07T19:48:32.486 DEBUG:teuthology.orchestra.run:got remote process result: 124 2020-01-07T19:48:32.486 INFO:tasks.workunit:Stopping ['fs/misc'] on client.0... 2020-01-07T19:48:32.486 INFO:teuthology.orchestra.run.smithi027:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2020-01-07T19:48:32.690 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task return task(**kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200106.232504/qa/tasks/workunit.py", line 138, in task cleanup=cleanup) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200106.232504/qa/tasks/workunit.py", line 288, in _spawn_on_all_clients timeout=timeout) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 87, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 101, in __next__ resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 37, in resurrect_traceback reraise(*exc_info) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 24, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200106.232504/qa/tasks/workunit.py", line 411, in _run_tests label="workunit test {workunit}".format(workunit=workunit) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 198, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 433, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 158, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 180, in _raise_for_status node=self.hostname, label=self.label CommandFailedError: Command failed (workunit test fs/misc/filelock_interrupt.py) on smithi027 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=ab553e090c24c56e06ca0e79dbeffb6473c8fefd TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/misc/filelock_interrupt.py'
From: /ceph/teuthology-archive/pdonnell-2020-01-07_16:10:56-fs-wip-pdonnell-testing-20200106.232504-distro-basic-smithi/4644574/teuthology.log
Updated by Zheng Yan over 4 years ago
Looks like flock syscall was restarted after handling signal alarm. The script does not work with python3, but work with python2.
Adding signal.siginterrupt(signal.SIGALRM, True) after signal.signal(signal.SIGALRM, handler) does not work with python3 either. This looks like python3 bug.
Updated by Patrick Donnelly over 4 years ago
Zheng Yan wrote:
Looks like flock syscall was restarted after handling signal alarm. The script does not work with python3, but work with python2.
Adding signal.siginterrupt(signal.SIGALRM, True) after signal.signal(signal.SIGALRM, handler) does not work with python3 either. This looks like python3 bug.
Can you convert it into a C program instead? Signal handling in any scripting language is always dicey anyway.
Updated by Zheng Yan over 4 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 32741
Updated by Patrick Donnelly over 4 years ago
- Status changed from Fix Under Review to Resolved