Bug #52438
closedqa: ffsb timeout
0%
Description
2021-08-27T20:23:38.033 INFO:tasks.workunit.client.0.smithi064.stdout:0.009000 2021-08-27T20:23:38.034 INFO:tasks.workunit:Stopping ['suites/ffsb.sh'] on client.0... 2021-08-27T20:23:38.035 DEBUG:teuthology.orchestra.run.smithi064:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2021-08-27T20:23:38.970 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/run_tasks.py", line 91, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/run_tasks.py", line 70, in run_one_task return task(**kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_571ce12cf5ad339da7d8efb36315a33417d719bb/qa/tasks/workunit.py", line 147, in task 2021-08-27T20:26:44.091 DEBUG:teuthology.orchestra.run.smithi187:> sudo grep -a 'avc: .*denied' /var/log/audit/audit.log | grep -av '\(comm="dmidecode"\|chronyd.service\|name="cephtest"\|scontext=system_u:system_r:nrpe_t:s0\|scontext=system_u:system_r:pcp_pmlogger_t\|scontext=system_u:system_r:pcp_pmcd_t:s0\|comm="rhsmd"\|scontext=system_u:system_r:syslogd_t:s0\|tcontext=system_u:system_r:nrpe_t:s0\|comm="updatedb"\|comm="smartd"\|comm="rhsmcertd-worke"\|comm="setroubleshootd"\|comm="rpm"\|tcontext=system_u:object_r:container_runtime_exec_t:s0\|comm="ksmtuned"\|comm="sssd"\|scontext=system_u:system_r:logrotate_t:s0\|scontext=system_u:system_r:logrotate_t:s0\)' cleanup=cleanup) File "/home/teuthworker/src/git.ceph.com_ceph-c_571ce12cf5ad339da7d8efb36315a33417d719bb/qa/tasks/workunit.py", line 297, in _spawn_on_all_clients timeout=timeout) File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/parallel.py", line 84, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/parallel.py", line 98, in __next__ resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/parallel.py", line 30, in resurrect_traceback raise exc.exc_info[1] File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/parallel.py", line 23, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_571ce12cf5ad339da7d8efb36315a33417d719bb/qa/tasks/workunit.py", line 426, in _run_tests label="workunit test {workunit}".format(workunit=workunit) File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/orchestra/remote.py", line 509, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/orchestra/run.py", line 455, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/orchestra/run.py", line 161, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/orchestra/run.py", line 183, in _raise_for_status node=self.hostname, label=self.label teuthology.exceptions.CommandFailedError: Command failed (workunit test suites/ffsb.sh) on smithi064 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=571ce12cf5ad339da7d8efb36315a33417d719bb TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/suites/ffsb.sh'
From: /ceph/teuthology-archive/pdonnell-2021-08-27_16:46:16-fs-wip-pdonnell-testing-20210827.024746-distro-basic-smithi/6362992/teuthology.log
Probably slowness caused by new "secure" flag to kclient.
Updated by Venky Shankar about 2 years ago
Recent failures:
- https://pulpito.ceph.com/vshankar-2022-02-08_14:27:29-fs-wip-vshankar-testing-20220208-181241-testing-default-smithi/6669874
- https://pulpito.ceph.com/vshankar-2022-02-08_14:27:29-fs-wip-vshankar-testing-20220208-181241-testing-default-smithi/6669887
- https://pulpito.ceph.com/vshankar-2022-02-08_14:27:29-fs-wip-vshankar-testing-20220208-181241-testing-default-smithi/6669872
(all kclient jobs)
Updated by Xiubo Li about 2 years ago
Venky Shankar wrote:
Recent failures:
2022-02-08T15:13:09.353 INFO:tasks.workunit.client.0.smithi143.stdout:Syncing()...42 sec 2022-02-08T15:13:09.354 INFO:tasks.workunit.client.0.smithi143.stdout:FFSB benchmark finished at: Tue Feb 8 15:13:09 2022 2022-02-08T15:13:09.354 INFO:tasks.workunit.client.0.smithi143.stdout: 2022-02-08T15:13:09.354 INFO:tasks.workunit.client.0.smithi143.stdout:Results: 2022-02-08T15:13:09.354 INFO:tasks.workunit.client.0.smithi143.stdout:Benchmark took 345.36 sec 2022-02-08T15:13:09.355 INFO:tasks.workunit.client.0.smithi143.stdout: ... 2022-02-08T15:13:13.377 INFO:tasks.workunit.client.0.smithi143.stdout:[ open] Total calls: 50147 2022-02-08T15:13:13.377 INFO:tasks.workunit.client.0.smithi143.stdout: 2022-02-08T15:13:13.377 INFO:tasks.workunit.client.0.smithi143.stdout:1.263000 2022-02-08T15:13:13.377 INFO:tasks.workunit.client.0.smithi143.stdout:0.035000 2022-02-08T15:13:13.377 INFO:tasks.workunit.client.0.smithi143.stdout:0.034000 2022-02-08T15:13:13.378 INFO:tasks.workunit.client.0.smithi143.stdout:4.431000 ... 2022-02-08T15:13:24.303 INFO:tasks.workunit.client.0.smithi143.stdout:==== 2022-02-08T15:13:24.303 INFO:tasks.workunit.client.0.smithi143.stdout:[ read] Total calls: 0 2022-02-08T15:13:24.304 INFO:tasks.workunit.client.0.smithi143.stdout: 2022-02-08T15:13:24.304 INFO:tasks.workunit.client.0.smithi143.stdout:==== 2022-02-08T15:13:24.304 INFO:tasks.workunit.client.0.smithi143.stdout:[ write] Total calls: 64188160 2022-02-08T15:13:24.304 INFO:tasks.workunit.client.0.smithi143.stdout: 2022-02-08T15:13:24.304 INFO:tasks.workunit.client.0.smithi143.stdout:39.202000 2022-02-08T15:13:24.305 INFO:tasks.workunit.client.0.smithi143.stdout:0.017000 2022-02-08T15:13:24.305 INFO:tasks.workunit.client.0.smithi143.stdout:0.015000 2022-02-08T15:13:24.305 INFO:tasks.workunit.client.0.smithi143.stdout:0.015000 ... 2022-02-08T18:07:04.625 INFO:tasks.workunit.client.0.smithi143.stdout:0.010000 2022-02-08T18:07:04.625 INFO:tasks.workunit.client.0.smithi143.stdout:0.011000 2022-02-08T18:07:04.625 INFO:tasks.workunit.client.0.smithi143.stdout:0.010000 2022-02-08T18:07:04.626 INFO:tasks.workunit:Stopping ['suites/ffsb.sh'] on client.0... 2022-02-08T18:07:04.627 DEBUG:teuthology.orchestra.run.smithi143:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
The ffsb test seems still not finished after 3 hours.
Not sure whether is this normal taking so long ?
- https://pulpito.ceph.com/vshankar-2022-02-08_14:27:29-fs-wip-vshankar-testing-20220208-181241-testing-default-smithi/6669887
- https://pulpito.ceph.com/vshankar-2022-02-08_14:27:29-fs-wip-vshankar-testing-20220208-181241-testing-default-smithi/6669872(all kclient jobs)
Updated by Xiubo Li about 2 years ago
Xiubo Li wrote:
Venky Shankar wrote:
Recent failures:
[...]
The ffsb test seems still not finished after 3 hours.
Not sure whether is this normal taking so long ?
I didn't see any error from the logs, maybe we should increase the timeout value for the ffsb test ?
Updated by Xiubo Li about 2 years ago
- Status changed from New to Fix Under Review
- Assignee set to Xiubo Li
- Pull request ID set to 44987
Updated by Xiubo Li about 2 years ago
Actually the `ffsb` test finished very fast and took 346.70 seconds:
```
2022-02-28T08:29:36.007 INFO:tasks.workunit.client.0.smithi065.stdout:make1: Leaving directory '/home/ubuntu/cephtest/mnt.0/client.0/tmp/ffsb'
2022-02-28T08:29:36.014 INFO:tasks.workunit.client.0.smithi065.stdout:FFSB version 6.0-RC2 started
2022-02-28T08:29:36.015 INFO:tasks.workunit.client.0.smithi065.stdout:
2022-02-28T08:29:36.024 INFO:tasks.workunit.client.0.smithi065.stdout:benchmark time = 300
2022-02-28T08:29:36.024 INFO:tasks.workunit.client.0.smithi065.stdout:ThreadGroup 0
2022-02-28T08:29:36.025 INFO:tasks.workunit.client.0.smithi065.stdout:================
...
2022-02-28T08:29:36.037 INFO:tasks.workunit.client.0.smithi065.stdout:checking existing fs: .
2022-02-28T08:29:36.038 INFO:tasks.workunit.client.0.smithi065.stdout:recreating new fileset
...
2022-02-28T08:36:22.993 INFO:tasks.workunit.client.0.smithi065.stdout:Syncing()...43 sec
2022-02-28T08:36:22.993 INFO:tasks.workunit.client.0.smithi065.stdout:FFSB benchmark finished at: Mon Feb 28 08:36:23 2022
2022-02-28T08:36:22.994 INFO:tasks.workunit.client.0.smithi065.stdout:
2022-02-28T08:36:22.994 INFO:tasks.workunit.client.0.smithi065.stdout:Results:
2022-02-28T08:36:22.994 INFO:tasks.workunit.client.0.smithi065.stdout:Benchmark took 346.70 sec
2022-02-28T08:36:22.995 INFO:tasks.workunit.client.0.smithi065.stdout:
```
But it just timedout when printing the `open/write/lseek/close` individual response times, which have 105492912 items:
```
2022-02-28T08:36:25.697 INFO:tasks.workunit.client.0.smithi065.stdout:[ open] 0.016000 12.052390 14466.169922 41176
...
2022-02-28T08:36:25.701 INFO:tasks.workunit.client.0.smithi065.stdout:[ write] 0.005000 0.110044 14464.117188 52705280
...
2022-02-28T08:36:25.706 INFO:tasks.workunit.client.0.smithi065.stdout:[ lseek] 0.001000 0.062884 5543.568848 52705280
...
2022-02-28T08:36:25.711 INFO:tasks.workunit.client.0.smithi065.stdout:[ close] 0.004000 0.016721 29.209000 41176
...
2022-02-28T08:36:25.716 INFO:tasks.workunit.client.0.smithi065.stdout:Overall Calls: 105492912
```
This is not cephfs's bug, let's fix it in https://github.com/ceph/ffsb instead to disable print this
Updated by Xiubo Li about 2 years ago
Created one pr in ffsb https://github.com/ceph/ffsb/pull/3 to fix it
Updated by Xiubo Li almost 2 years ago
- Status changed from Fix Under Review to Resolved