Project

General

Profile

Actions

Bug #9274

closed

"AssertionError: failed to recover before timeout expired" in upgrade:dumpling-x:stress-split-master-distro-basic-vps

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-27_15:40:02-upgrade:dumpling-x:stress-split-master-distro-basic-vps/456491/

3 jobs failed with similar errors: '456491', '456492', '456493'

2014-08-27T20:03:21.859 INFO:teuthology.orchestra.run.vpm161:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados rmpool unique_pool_0 unique_pool_0 --yes-i-really-really-mean-it'
2014-08-27T20:03:21.995 INFO:teuthology.orchestra.run.vpm161.stdout:successfully deleted pool unique_pool_0
2014-08-27T20:03:21.999 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2014-08-27T20:03:21.999 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2014-08-27T20:03:21.999 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds
2014-08-27T20:03:21.999 INFO:tasks.thrashosds:joining thrashosds
2014-08-27T20:03:21.999 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 113, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/thrashosds.py", line 169, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 209, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
AssertionError: failed to recover before timeout expired
archive_path: /var/lib/teuthworker/archive/teuthology-2014-08-27_15:40:02-upgrade:dumpling-x:stress-split-master-distro-basic-vps/456491
branch: master
description: upgrade:dumpling-x:stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rados_api_tests.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/centos_6.5.yaml}
email: ceph-qa@ceph.com
job_id: '456491'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-08-27_15:40:02-upgrade:dumpling-x:stress-split-master-distro-basic-vps
nuke-on-error: true
os_type: centos
os_version: '6.5'
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        osd heartbeat grace: 100
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: b013625cf4a32b73300a42c88926f5546db5cb14
  ceph-deploy:
    branch:
      dev: master
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: b013625cf4a32b73300a42c88926f5546db5cb14
  rgw:
    default_idle_timeout: 1200
  s3tests:
    branch: master
    idle_timeout: 1200
  workunit:
    sha1: b013625cf4a32b73300a42c88926f5546db5cb14
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
  - mon.c
- - osd.3
  - osd.4
  - osd.5
- - client.0
suite: upgrade:dumpling-x:stress-split
suite_branch: master
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_master
targets:
  ubuntu@vpm161.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAw63BU+XPenOd+8IceTXPgFao7N6EVN9gFLni3du8RA37LkysUxmY3J765XoqDm+PIeomCw9bgFrSfigCjy2hy9IQZRfyAuze01J6NSbge2inijfjBg1FsUJ6Nknq3wHnCS31OGclsmEkHJEQdUUw6AKC3H+OgcF2caccq0KOWzwIIUnEAuAHNBX30Bq8K6d1J0UmQeodeihfDvWAlmb3KvUne8ZvvcXB5tAEFvvYVADndeVRk2lidLpW2gxOtfAht21wJUBj/sYYXAWs3c8sEQd/rS+KtMJVQFAXFt2/qk6+KUHf6atW2xA0oP63Ppibnmljx8ge3Dcbc1U0BQclAQ==
  ubuntu@vpm170.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr6Gp/3GEB9njxcwmAEiLQlYQYRd8HXwh4hOqipyamXNDwjiJPnvZKm1UZqZsbP23K8OlUdgxGJc+XKZap6t0D8rwr7Tae64d4+U+Jsfnxihjz4qaGdV7BW/r50R6gAQRMcxvuoRB1EDHb2mimgD7Axg4NjazfqP9Ue7/05VnROhs6EXlvFfvXz89GXg9wKvGLSkpcyq4VCiNiKHFSMU7gKZB5qtw6j0I8f4hi/m11R/m+yDsBdFeQQ7mwjCvRdLNOYagxd5Gn++AVy8ICm2c7LbbU5XaDIaAcQenOudxV4RqHUW79ul+B/PQNtjDfi+GOxqHWrsMyRRlLxQ2AV0Zbw==
  ubuntu@vpm171.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAvkt5mbvK8H1jzHU55iMvoMEFQdOU9qZkHklFy9gkNAcpCQflUKOlQ2rIQ7OFUJdMDL/9tUSorR4Seuky152aFUmWjEA31IxViLZGx0Bg5W2f50SFValBkDWbzDoD4EABg0JuSU8e5Hrp8531nDLUuhdmyEmBmYv7fdPk3aATnW+ZaODPLiPBIlY+Qhlgq/7nd+eZT9JSJRl8ovbLwtuxIe/9VmL577EumlWpzW1ul9BtXRFOoKhYwFecmC+G48k0E/1FlbVZc714VtHbOr5T20s+90s1fBb1MEPtUQ20xQcox8OnpWM83kCX0kuSzsAqOMdOA3kxchWUuC0+TygjpQ==
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0: null
    default_idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: master
tube: vps
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.13134
description: upgrade:dumpling-x:stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rados_api_tests.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/centos_6.5.yaml}
duration: 8386.92367196083
failure_reason: failed to recover before timeout expired
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #9272: Test failed on wait_until_healthy in upgrade:dumpling-firefly-x-master-distro-basic-vps suiteDuplicate08/28/2014

Actions
Actions #1

Updated by Samuel Just over 9 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Samuel Just over 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Samuel Just
Actions #3

Updated by Samuel Just over 9 years ago

This seems to have been due to a hung ceph command, I think.

Actions #4

Updated by Samuel Just over 9 years ago

  • Status changed from In Progress to 12
  • Assignee deleted (Samuel Just)
Actions #5

Updated by Samuel Just over 9 years ago

  • Status changed from 12 to 7
Actions #6

Updated by Samuel Just over 9 years ago

  • Status changed from 7 to Can't reproduce
Actions

Also available in: Atom PDF