Project

General

Profile

Actions

Bug #9672

closed

Failed tests in powercycle-giant-distro-basic-multi run

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-04_09:00:36-powercycle-giant-distro-basic-multi/528139/

2014-10-06T08:57:20.122 INFO:tasks.ceph.ceph_manager:making progress, resetting timeout
2014-10-06T08:57:20.122 INFO:teuthology.orchestra.run.plana93:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format=json'
2014-10-06T08:57:20.376 INFO:teuthology.orchestra.run.plana93.stderr:dumped all in format json
2014-10-06T08:57:20.486 INFO:tasks.workunit:Stopping ['fs/misc'] on client.0...
2014-10-06T08:57:20.487 INFO:teuthology.orchestra.run.plana93:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0'
2014-10-06T08:57:20.620 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/workunit.py", line 359, in _run_tests
    args=args,
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 128, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 361, in run
    r.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 105, in wait
    exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on plana93 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=25bcc39bb809e2d13beea1529e4ab92d1b61fa5b TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.0/fs/misc/multiple_rsync.sh'
archive_path: /var/lib/teuthworker/archive/teuthology-2014-10-04_09:00:36-powercycle-giant-distro-basic-multi/528139
branch: giant
description: powercycle/osd/{clusters/3osd-1per-target.yaml fs/xfs.yaml powercycle/default.yaml
  tasks/cfuse_workunit_misc.yaml}
email: ceph-qa@ceph.com
job_id: '528139'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: plana,burnupi,mira
name: teuthology-2014-10-04_09:00:36-powercycle-giant-distro-basic-multi
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: giant
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
        osd sloppy crc: true
    fs: xfs
    log-whitelist:
    - slow request
    sha1: 25bcc39bb809e2d13beea1529e4ab92d1b61fa5b
  ceph-deploy:
    branch:
      dev: giant
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 25bcc39bb809e2d13beea1529e4ab92d1b61fa5b
  s3tests:
    branch: giant
  workunit:
    sha1: 25bcc39bb809e2d13beea1529e4ab92d1b61fa5b
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.0
  - mon.1
  - mon.2
  - mds.0
  - client.0
- - osd.0
- - osd.1
- - osd.2
suite: powercycle
suite_branch: giant
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_giant
targets:
  ubuntu@burnupi31.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVF6RMXFqe4px+HZf7933ToHtx1O13RMyI7RWhORz2R6eXp1UxslV8439d2DQHmJTMM836jWxktEtsZsZz2C/o4qR9Iew6Snuy4KF+jp5H2doX26A64TVgjn2kuSWqyw5TTh1pkI5gex1bF20xuNw7CSTOUz8JMXi7hKm5fxWikEZriXZM1AT/JpzqbOXaRYeGhTgocMFYrH+vTU9GxJOTZJjsW3POfqO7BamvXsBbtASKd9QTq2EoZ6csWAGWdG9ayDS6OTqY1oIL1+4TIO3Iulk/xyRtCZ2J5i9eJHTwJZZ26cUWh996AMDwklklAQlQ/bXOqZMdZjAejZ8nxH0j
  ubuntu@plana43.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCqPGfJrNd8xd8AicMpw6v4bYa7m4QPfopER9hsAgsbm0pO9Zfs6JoEN3qAgTN61Lp9RnlDgnSOyes0g5Yzla911K2jfEMvOG9uMv2ZTt/oMVXwZjs4Etk9wL7dPAr2jc0AMq4cQdVdLDfwtFRR/ZKTkTffhT6lFdZX7rj0DQn4AfHc16Iv2jmnSEOq9mgaU+rN0xW2c3CHGmP/24/a3ybeyOgqu4ICZfP8KAHC6jXwinx/jgIEGNrnXGojjTOaWL2sJXd1jN0TwTbNXTCPV1I9qH6ErAJhcLWuDqu2tusmwZl0sFh15hq9MMRFfgZavLcp657wLGBSAFQiH3SflGkr
  ubuntu@plana68.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDN/p55jmz08gik2BX9h++ylasFj74ZysGxYsNDeeg4olDWNVz++6cDkRqoR+8SE7yZqk7iSZryr+Y3bQibjXK1PFeeiUtuJntIjIXIU7s9z3FC2EM3aJYB2wWW9IOuuFplEhg+QJAfxnzFLe0WJ9y6PzEITYejDD+pxzpS5fi0+D0WmNTwKlGBGMUz+6yAFZ0QrPxvSWkxIuZC1PRUefL3UUV2xCEmNge8PygeiRhdcn8iB8Ib1Bj+yyWUTFZ2RGbz6Y7sCVqckFGXIrhu6wjfYXpaYBYUbVg7R2qtwld6qybTcp+1RI8cc5RMyutX4PCC54Pjpbni7Kv/CusfXsSL
  ubuntu@plana93.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDR1cOyYePEKcQCKt3e8YSuoa1YBhGSsSgS+HIVdzBKaP8acz498fLahdTftFw+KfPtONP3PTn5PrrG7ykMlxvLXI1dWs+a1TEwPrPCFIPo562ok+9/g4FFUIEb2gqXp8PJfOI1oI5UZZXA6bKCgiaIcD9dkAAKCBpJkJanT68YTT8weLGK3iWHtVQCZyu8LvOGV9xviCs9QtASec+SforgYSOys3KfgA3K9TzAre5TOAoankFc2F6Si36Hn7Iv2OeXwkdog3Zle4Ml+3+nRIZCOvNZtLKVsUiUVsLa0I0j/ac9rIqhOPXCYgPgYwqYE9jB3zJNKymHr29Y24XP2+kd
tasks:
- internal.lock_machines:
  - 4
  - plana,burnupi,mira
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.push_inventory: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install: null
- ceph: null
- thrashosds:
    chance_down: 1.0
    powercycle: true
    timeout: 600
- ceph-fuse: null
- workunit:
    clients:
      all:
      - fs/misc
teuthology_branch: master
tube: multi
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.3194
description: powercycle/osd/{clusters/3osd-1per-target.yaml fs/xfs.yaml powercycle/default.yaml
  tasks/cfuse_workunit_misc.yaml}
duration: 17172.807722091675
failure_reason: 'Command failed on plana93 with status 124: ''mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp
  && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1
  CEPH_REF=25bcc39bb809e2d13beea1529e4ab92d1b61fa5b TESTDIR="/home/ubuntu/cephtest" 
  CEPH_ID="0" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage
  timeout 3h /home/ubuntu/cephtest/workunit.client.0/fs/misc/multiple_rsync.sh'''
flavor: basic
owner: scheduled_teuthology@teuthology
success: false
Actions #1

Updated by Samuel Just over 9 years ago

  • Project changed from Ceph to CephFS
  • Priority changed from Normal to High
Actions #2

Updated by John Spray over 9 years ago

Was there anything to this other than the timeout? It appears to have been making progress up til that point, so this was probably just a slightly-too-slow run. I've seen this particular workload time out a bunch of times, so maybe we should up the timeout (or use a smaller workload) though.

Actions #3

Updated by Greg Farnum over 9 years ago

  • Project changed from CephFS to teuthology
  • Assignee set to Yuri Weinstein

Status 124 is timeout kill. You can swap it out for a shorter test as John suggested if you like, Yuri, but it's not an FS bug.

Actions #4

Updated by Yuri Weinstein over 9 years ago

Note: See if timeout in default.yaml will fix it

Actions #6

Updated by Yuri Weinstein over 9 years ago

Run http://pulpito.front.sepia.ceph.com/teuthology-2014-10-26_08:33:19-powercycle-giant-distro-basic-multi/

Jobs ['571770', '571773'] failed

In http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-26_08:33:19-powercycle-giant-distro-basic-multi/571770/teuthology.log

2014-10-26T16:08:15.032 INFO:tasks.workunit:Stopping ['fs/misc'] on client.0...
2014-10-26T16:08:15.032 INFO:teuthology.orchestra.run.mira072:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0'
2014-10-26T16:08:15.191 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/workunit.py", line 359, in _run_tests
    args=args,
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 128, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 364, in run
    r.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 105, in wait
    exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on mira072 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=b05efddb77290b86eb5c150776c761ab84f66f37 TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/workunit.client.0/fs/misc/multiple_rsync.sh'

In http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-26_08:33:19-powercycle-giant-distro-basic-multi/571773/teuthology.log

2014-10-26T14:23:03.476 INFO:tasks.workunit:Stopping ['suites/fsx.sh'] on client.0...
2014-10-26T14:23:03.477 INFO:teuthology.orchestra.run.mira023:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0'
2014-10-26T14:23:03.564 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/workunit.py", line 359, in _run_tests
    args=args,
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 128, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 364, in run
    r.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 105, in wait
    exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on mira023 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=b05efddb77290b86eb5c150776c761ab84f66f37 TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.0/suites/fsx.sh'
Actions #7

Updated by Yuri Weinstein over 9 years ago

  • Priority changed from High to Urgent
Actions #8

Updated by Sage Weil over 9 years ago

the second one is a timeout again... it was only 3h. the 6h should be added to this workunit too it looks like.

the first one looks like the ceph-fuse issue zheng was working on yesterday, #9674

Actions #9

Updated by Yuri Weinstein over 9 years ago

  • Status changed from New to 7

Sage's comment for the second failed test addressed in https://github.com/ceph/ceph-qa-suite/pull/223

Actions

Also available in: Atom PDF