Project

General

Profile

Bug #52438

qa: ffsb timeout

Added by Patrick Donnelly over 1 year ago. Updated 7 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph, qa-suite
Labels (FS):
qa, qa-failure, task(easy)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-08-27T20:23:38.033 INFO:tasks.workunit.client.0.smithi064.stdout:0.009000
2021-08-27T20:23:38.034 INFO:tasks.workunit:Stopping ['suites/ffsb.sh'] on client.0...
2021-08-27T20:23:38.035 DEBUG:teuthology.orchestra.run.smithi064:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2021-08-27T20:23:38.970 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/run_tasks.py", line 91, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/run_tasks.py", line 70, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_571ce12cf5ad339da7d8efb36315a33417d719bb/qa/tasks/workunit.py", line 147, in task
2021-08-27T20:26:44.091 DEBUG:teuthology.orchestra.run.smithi187:> sudo grep -a 'avc: .*denied' /var/log/audit/audit.log | grep -av '\(comm="dmidecode"\|chronyd.service\|name="cephtest"\|scontext=system_u:system_r:nrpe_t:s0\|scontext=system_u:system_r:pcp_pmlogger_t\|scontext=system_u:system_r:pcp_pmcd_t:s0\|comm="rhsmd"\|scontext=system_u:system_r:syslogd_t:s0\|tcontext=system_u:system_r:nrpe_t:s0\|comm="updatedb"\|comm="smartd"\|comm="rhsmcertd-worke"\|comm="setroubleshootd"\|comm="rpm"\|tcontext=system_u:object_r:container_runtime_exec_t:s0\|comm="ksmtuned"\|comm="sssd"\|scontext=system_u:system_r:logrotate_t:s0\|scontext=system_u:system_r:logrotate_t:s0\)'
    cleanup=cleanup)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_571ce12cf5ad339da7d8efb36315a33417d719bb/qa/tasks/workunit.py", line 297, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_571ce12cf5ad339da7d8efb36315a33417d719bb/qa/tasks/workunit.py", line 426, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/orchestra/remote.py", line 509, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d0ccb5e7543966c9868cca0e1d0b1e1f5b5df280/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed (workunit test suites/ffsb.sh) on smithi064 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=571ce12cf5ad339da7d8efb36315a33417d719bb TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/suites/ffsb.sh'

From: /ceph/teuthology-archive/pdonnell-2021-08-27_16:46:16-fs-wip-pdonnell-testing-20210827.024746-distro-basic-smithi/6362992/teuthology.log

Probably slowness caused by new "secure" flag to kclient.

History

#2 Updated by Xiubo Li 12 months ago

Venky Shankar wrote:

Recent failures:

- https://pulpito.ceph.com/vshankar-2022-02-08_14:27:29-fs-wip-vshankar-testing-20220208-181241-testing-default-smithi/6669874

2022-02-08T15:13:09.353 INFO:tasks.workunit.client.0.smithi143.stdout:Syncing()...42 sec
2022-02-08T15:13:09.354 INFO:tasks.workunit.client.0.smithi143.stdout:FFSB benchmark finished   at: Tue Feb  8 15:13:09 2022
2022-02-08T15:13:09.354 INFO:tasks.workunit.client.0.smithi143.stdout:
2022-02-08T15:13:09.354 INFO:tasks.workunit.client.0.smithi143.stdout:Results:
2022-02-08T15:13:09.354 INFO:tasks.workunit.client.0.smithi143.stdout:Benchmark took 345.36 sec
2022-02-08T15:13:09.355 INFO:tasks.workunit.client.0.smithi143.stdout:
...
2022-02-08T15:13:13.377 INFO:tasks.workunit.client.0.smithi143.stdout:[   open] Total calls:        50147
2022-02-08T15:13:13.377 INFO:tasks.workunit.client.0.smithi143.stdout:
2022-02-08T15:13:13.377 INFO:tasks.workunit.client.0.smithi143.stdout:1.263000
2022-02-08T15:13:13.377 INFO:tasks.workunit.client.0.smithi143.stdout:0.035000
2022-02-08T15:13:13.377 INFO:tasks.workunit.client.0.smithi143.stdout:0.034000
2022-02-08T15:13:13.378 INFO:tasks.workunit.client.0.smithi143.stdout:4.431000
...
2022-02-08T15:13:24.303 INFO:tasks.workunit.client.0.smithi143.stdout:====
2022-02-08T15:13:24.303 INFO:tasks.workunit.client.0.smithi143.stdout:[   read] Total calls:            0
2022-02-08T15:13:24.304 INFO:tasks.workunit.client.0.smithi143.stdout:
2022-02-08T15:13:24.304 INFO:tasks.workunit.client.0.smithi143.stdout:====
2022-02-08T15:13:24.304 INFO:tasks.workunit.client.0.smithi143.stdout:[  write] Total calls:     64188160
2022-02-08T15:13:24.304 INFO:tasks.workunit.client.0.smithi143.stdout:
2022-02-08T15:13:24.304 INFO:tasks.workunit.client.0.smithi143.stdout:39.202000
2022-02-08T15:13:24.305 INFO:tasks.workunit.client.0.smithi143.stdout:0.017000
2022-02-08T15:13:24.305 INFO:tasks.workunit.client.0.smithi143.stdout:0.015000
2022-02-08T15:13:24.305 INFO:tasks.workunit.client.0.smithi143.stdout:0.015000
...
2022-02-08T18:07:04.625 INFO:tasks.workunit.client.0.smithi143.stdout:0.010000
2022-02-08T18:07:04.625 INFO:tasks.workunit.client.0.smithi143.stdout:0.011000
2022-02-08T18:07:04.625 INFO:tasks.workunit.client.0.smithi143.stdout:0.010000
2022-02-08T18:07:04.626 INFO:tasks.workunit:Stopping ['suites/ffsb.sh'] on client.0...
2022-02-08T18:07:04.627 DEBUG:teuthology.orchestra.run.smithi143:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0

The ffsb test seems still not finished after 3 hours.

Not sure whether is this normal taking so long ?

- https://pulpito.ceph.com/vshankar-2022-02-08_14:27:29-fs-wip-vshankar-testing-20220208-181241-testing-default-smithi/6669887
- https://pulpito.ceph.com/vshankar-2022-02-08_14:27:29-fs-wip-vshankar-testing-20220208-181241-testing-default-smithi/6669872

(all kclient jobs)

#3 Updated by Xiubo Li 12 months ago

Xiubo Li wrote:

Venky Shankar wrote:

Recent failures:

- https://pulpito.ceph.com/vshankar-2022-02-08_14:27:29-fs-wip-vshankar-testing-20220208-181241-testing-default-smithi/6669874

[...]

The ffsb test seems still not finished after 3 hours.

Not sure whether is this normal taking so long ?

I didn't see any error from the logs, maybe we should increase the timeout value for the ffsb test ?

#4 Updated by Xiubo Li 12 months ago

  • Status changed from New to Fix Under Review
  • Assignee set to Xiubo Li
  • Pull request ID set to 44987

#5 Updated by Xiubo Li 11 months ago

Actually the `ffsb` test finished very fast and took 346.70 seconds:

```
2022-02-28T08:29:36.007 INFO:tasks.workunit.client.0.smithi065.stdout:make1: Leaving directory '/home/ubuntu/cephtest/mnt.0/client.0/tmp/ffsb'
2022-02-28T08:29:36.014 INFO:tasks.workunit.client.0.smithi065.stdout:FFSB version 6.0-RC2 started
2022-02-28T08:29:36.015 INFO:tasks.workunit.client.0.smithi065.stdout:
2022-02-28T08:29:36.024 INFO:tasks.workunit.client.0.smithi065.stdout:benchmark time = 300
2022-02-28T08:29:36.024 INFO:tasks.workunit.client.0.smithi065.stdout:ThreadGroup 0
2022-02-28T08:29:36.025 INFO:tasks.workunit.client.0.smithi065.stdout:================
...
2022-02-28T08:29:36.037 INFO:tasks.workunit.client.0.smithi065.stdout:checking existing fs: .
2022-02-28T08:29:36.038 INFO:tasks.workunit.client.0.smithi065.stdout:recreating new fileset
...
2022-02-28T08:36:22.993 INFO:tasks.workunit.client.0.smithi065.stdout:Syncing()...43 sec
2022-02-28T08:36:22.993 INFO:tasks.workunit.client.0.smithi065.stdout:FFSB benchmark finished at: Mon Feb 28 08:36:23 2022
2022-02-28T08:36:22.994 INFO:tasks.workunit.client.0.smithi065.stdout:
2022-02-28T08:36:22.994 INFO:tasks.workunit.client.0.smithi065.stdout:Results:
2022-02-28T08:36:22.994 INFO:tasks.workunit.client.0.smithi065.stdout:Benchmark took 346.70 sec
2022-02-28T08:36:22.995 INFO:tasks.workunit.client.0.smithi065.stdout:
```

But it just timedout when printing the `open/write/lseek/close` individual response times, which have 105492912 items:

```
2022-02-28T08:36:25.697 INFO:tasks.workunit.client.0.smithi065.stdout:[ open] 0.016000 12.052390 14466.169922 41176
...
2022-02-28T08:36:25.701 INFO:tasks.workunit.client.0.smithi065.stdout:[ write] 0.005000 0.110044 14464.117188 52705280
...
2022-02-28T08:36:25.706 INFO:tasks.workunit.client.0.smithi065.stdout:[ lseek] 0.001000 0.062884 5543.568848 52705280
...
2022-02-28T08:36:25.711 INFO:tasks.workunit.client.0.smithi065.stdout:[ close] 0.004000 0.016721 29.209000 41176
...
2022-02-28T08:36:25.716 INFO:tasks.workunit.client.0.smithi065.stdout:Overall Calls: 105492912

```

This is not cephfs's bug, let's fix it in https://github.com/ceph/ffsb instead to disable print this

#6 Updated by Xiubo Li 11 months ago

Created one pr in ffsb https://github.com/ceph/ffsb/pull/3 to fix it

#7 Updated by Xiubo Li 11 months ago

  • Pull request ID deleted (44987)

#8 Updated by Xiubo Li 7 months ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF