Project

General

Profile

Actions

Bug #7654

closed

"AssertionError: failed to recover before timeout expired" in teuthology-2014-03-07_02:30:02-rados-firefly-testing-basic-plana suite

Added by Yuri Weinstein about 10 years ago. Updated about 10 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-03-07_02:30:02-rados-firefly-testing-basic-plana/120975/

What is interesting that it passed manually

2014-03-07T05:14:32.241 INFO:teuthology.task.radosbench.radosbench.0.err:[10.214.131.27]: 2014-03-07 05:14:32.240382 7fbdac1ec700  0 -- 10.214.131.27:0/1003584 >> 10.214.131.27:6805/3607 pipe(0x7fbda4012660 sd=8 :47649 s=2 pgs=168 cs=1 l=1 c=0x7fbda400ee70).injecting socket failure
2014-03-07T05:14:34.938 INFO:teuthology.task.radosbench.radosbench.0.err:[10.214.131.27]: 2014-03-07 05:14:34.937536 7fbdb4149700  0 -- 10.214.131.27:0/1003584 >> 10.214.131.27:6805/3607 pipe(0x7fbda40125a0 sd=8 :47653 s=2 pgs=169 cs=1 l=1 c=0x7fbda402b230).injecting socket failure
2014-03-07T05:14:40.644 INFO:teuthology.task.thrashosds.ceph_manager:creating pool_name unique_pool_0
2014-03-07T05:14:40.644 DEBUG:teuthology.orchestra.run:Running [10.214.131.27]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados rmpool unique_pool_0 unique_pool_0 --yes-i-really-really-mean-it'
2014-03-07T05:14:41.445 INFO:teuthology.orchestra.run.out:[10.214.131.27]: successfully deleted pool unique_pool_0
2014-03-07T05:14:41.446 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds
2014-03-07T05:14:41.447 INFO:teuthology.task.thrashosds:joining thrashosds
2014-03-07T05:14:41.447 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/run_tasks.py", line 84, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/thrashosds.py", line 172, in task
    thrash_proc.do_join()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph_manager.py", line 153, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
AssertionError: failed to recover before timeout expired
archive_path: /var/lib/teuthworker/archive/teuthology-2014-03-07_02:30:02-rados-firefly-testing-basic-plana/120975
description: rados/thrash/{clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml
  thrashers/default.yaml workloads/radosbench.yaml}
email: null
job_id: '120975'
kernel: &id001
  kdb: true
  sha1: f31a96afabfad92cb917fd52a421b23275cdb6da
last_in_suite: false
machine_type: plana
name: teuthology-2014-03-07_02:30:02-rados-firefly-testing-basic-plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug ms: 1
        debug osd: 20
        osd op thread timeout: 60
        osd sloppy crc: true
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 8221a8ecba14f80eb7e35e3b1e6fe8487502b2d9
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 8221a8ecba14f80eb7e35e3b1e6fe8487502b2d9
  s3tests:
    branch: master
  workunit:
    sha1: 8221a8ecba14f80eb7e35e3b1e6fe8487502b2d9
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
  - client.0
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.1
targets:
  ubuntu@plana13.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0nhcOK/M8OlAE94rb4jwc/QFI9kNDENQmJdu1Z2BRn30mPxoGc4LEhK1L1qK4CpDd24KWzXwjAigAYcMEOvJaLKqkNgDMVx+GT/9EAwH/AI60QootBCHyEtfZFzcCg0K1LdzxtOJTKKlFHr4KAI3i36T7RfCkdbeANZ6SfjKwmgw9SZ8v1oGYT11e33bqAfzETLzm4D+5d5ioB+dz/4EiLYwdyx/tzvqwJfpQpC0aK31+7oIu+0jsBoOUxkNhwe498XB+8KlKcnnmeHRKn9XSyDVN8i6IbGQ2Cg1FXnR+b525cS/5TYrYJckCzCFEmybCIZAV+FfxY762kQx7a6ov
  ubuntu@plana51.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDgNXP3p/sw2sy34ARorzUh9QvDPit80IHDKQ71BGtytSAQL5ijlSpjJRjGT0HB9xHvR6v8115ikzmot1HgVeJSnC07UQKWp3CfVIUHZOtbMgw0exON14083tSlvn2djTA/bphuwag5u9y+0XkufOXBNrY4aBlQS9vNXnsW0PQwlgJ6YqK3W2e1qirpvfMamLugAFLdycCXXmjriXFuAxvHqbFrJYVEvNbsK8Bt+cRE5l0gcBin+5wJmz4iKagwYVAVqW7i1lZM1F0QdffYwuUrQ110/iz9AcnNvu6dSU+3g7agjBKvWCA+DVEn0RWbaRJ7M+FCl2PmLULnjvK44Qsp
tasks:
- internal.lock_machines:
  - 2
  - plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- radosbench:
    clients:
    - client.0
    time: 1800
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.11137
description: rados/thrash/{clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml
  thrashers/default.yaml workloads/radosbench.yaml}
duration: 2457.6572670936584
failure_reason: failed to recover before timeout expired
flavor: basic
mon.a-kernel-sha1: f31a96afabfad92cb917fd52a421b23275cdb6da
mon.b-kernel-sha1: f31a96afabfad92cb917fd52a421b23275cdb6da
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #7635: failed to recover before timeout expiredDuplicateSamuel Just03/06/2014

Actions
Actions #1

Updated by Samuel Just about 10 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Sage Weil about 10 years ago

  • Status changed from New to Duplicate

7635

Actions

Also available in: Atom PDF