Project

General

Profile

Bug #36366

luminous: qa: blogbench hang with two kclients and 3 active mds

Added by Patrick Donnelly 2 months ago. Updated 2 months ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
10/09/2018
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, kceph
Labels (FS):
Pull request ID:

Description

From luminous QA run with -k testing:

2018-10-09T11:08:29.238 INFO:tasks.workunit.client.0.smithi138.stdout:       108            0           565             0           595             0           610
2018-10-09T11:08:29.797 INFO:tasks.workunit.client.1.smithi138.stdout:        60       160065           119        108042           180         63401           126
2018-10-09T11:08:39.238 INFO:tasks.workunit.client.0.smithi138.stdout:       122            0           668             0           673             0           795
2018-10-09T11:08:39.797 INFO:tasks.workunit.client.1.smithi138.stdout:        65       355139           313        239741           334        138583           414
2018-10-09T11:09:29.693 INFO:tasks.workunit.client.1.smithi138.stdout:
2018-10-09T11:09:29.693 INFO:tasks.workunit.client.1.smithi138.stdout:Final score for writes:            65
2018-10-09T11:09:29.693 INFO:tasks.workunit.client.1.smithi138.stdout:Final score for reads :         87452
2018-10-09T11:09:29.693 INFO:tasks.workunit.client.1.smithi138.stdout:
2018-10-09T11:09:29.696 INFO:teuthology.orchestra.run:Running command with timeout 900
2018-10-09T11:09:29.697 INFO:teuthology.orchestra.run.smithi138:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.1/client.1/tmp'
2018-10-09T11:10:22.233 INFO:tasks.workunit:Stopping ['suites/blogbench.sh'] on client.1...
2018-10-09T11:10:22.233 INFO:teuthology.orchestra.run.smithi138:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.1 /home/ubuntu/cephtest/clone.client.1'
2018-10-09T11:10:22.531 DEBUG:teuthology.parallel:result is None
2018-10-09T14:03:23.455 DEBUG:teuthology.orchestra.run:got remote process result: 124
2018-10-09T14:03:23.477 INFO:tasks.workunit:Stopping ['suites/blogbench.sh'] on client.0...
2018-10-09T14:03:23.477 INFO:teuthology.orchestra.run.smithi138:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0'
2018-10-09T14:03:23.835 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 206, in task
    cleanup=cleanup)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 356, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 479, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/remote.py", line 193, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 429, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed (workunit test suites/blogbench.sh) on smithi138 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=luminous TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/suites/blogbench.sh'

From: /ceph/teuthology-archive/pdonnell-2018-10-09_04:02:38-multimds-luminous-testing-basic-smithi/3120996/teuthology.log

Obviously no logs from the clients. Will need to try reproducing by running the job multiple times.


Related issues

Duplicates fs - Bug #36348: luminous(?): blogbench I/O with two kernel clients; one stalls Need More Info 10/08/2018

History

#1 Updated by Patrick Donnelly 2 months ago

  • Status changed from New to Duplicate

#2 Updated by Patrick Donnelly 2 months ago

  • Duplicates Bug #36348: luminous(?): blogbench I/O with two kernel clients; one stalls added

Also available in: Atom PDF