Project

General

Profile

Actions

Bug #9272

closed

Test failed on wait_until_healthy in upgrade:dumpling-firefly-x-master-distro-basic-vps suite

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in:

http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-28_11:18:18-upgrade:dumpling-firefly-x-master-distro-basic-vps/458041/teuthology.log

Two jobs failed: ['458041', '458047'] in this run

2014-08-28T13:11:33.915 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 14 pgs recovering; 14 pgs stuck unclean; recovery 16649/20830 objects degraded (79.928%)
2014-08-28T13:11:34.015 INFO:tasks.workunit.client.0.vpm023.stdout:  690: throughput=0.401MB/sec pending data=1093249
2014-08-28T13:11:34.914 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 50, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 39, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 48, in task
    mgr.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph.py", line 1090, in restart
    healthy(ctx=ctx, config=None)
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph.py", line 995, in healthy
    remote=mon0_remote,
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 820, in wait_until_healthy
    while proceed():
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 127, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds
2014-08-28T13:11:34.916 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 98, in next
    result = self.results.get()
  File "/usr/lib/python2.7/dist-packages/gevent/queue.py", line 190, in get
    return waiter.get()
  File "/usr/lib/python2.7/dist-packages/gevent/hub.py", line 321, in get
    return get_hub().switch()
  File "/usr/lib/python2.7/dist-packages/gevent/hub.py", line 164, in switch
    return greenlet.switch(self)
GreenletExit
2014-08-28T13:11:34.917 INFO:tasks.workunit:Stopping ['rados/load-gen-big.sh'] on client.0...
2014-08-28T13:11:34.917 INFO:teuthology.orchestra.run.vpm023:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0'
2014-08-28T13:11:34.954 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 51, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 39, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 43, in task
    p.spawn(_run_spawned, ctx, confg, taskname)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 89, in __exit__
    raise
MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds
archive_path: /var/lib/teuthworker/archive/teuthology-2014-08-28_11:18:18-upgrade:dumpling-firefly-x-master-distro-basic-vps/458041
branch: master
description: upgrade:dumpling-firefly-x/parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-firefly-upgrade/firefly.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml
  test_rbd_api.yaml test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-daemon.yaml
  6-final-workload/{ec-readwrite.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml
  rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml}
  distros/centos_6.5.yaml}
email: ceph-qa@ceph.com
job_id: '458041'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-08-28_11:18:18-upgrade:dumpling-firefly-x-master-distro-basic-vps
nuke-on-error: true
os_type: centos
os_version: '6.5'
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        osd heartbeat grace: 100
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - scrub mismatch
    - ScrubResult
    sha1: 5b0af4c8aa797f77810729b59dd9d1a70e15ab26
  ceph-deploy:
    branch:
      dev: master
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 5b0af4c8aa797f77810729b59dd9d1a70e15ab26
  rgw:
    default_idle_timeout: 1200
  s3tests:
    branch: master
    idle_timeout: 1200
  workunit:
    sha1: 5b0af4c8aa797f77810729b59dd9d1a70e15ab26
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
  - client.1
suite: upgrade:dumpling-firefly-x
suite_branch: master
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_master
targets:
  ubuntu@vpm023.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqX5iuTHqUrgjHkytAca3PA2sKxpAixpQ7obQ7k4/mph/uo7HhwXJ/YmjUr4pDMaJpkxhft3ezS4XmJM/nfmtU1tuNveZoKH96G68PuA3hGFbJ2ZK7+7NQvcg4VEOAgiQfOn/9W9/a7MRP5Io5VNks831vWQTgRD7PIgX+R+/dmd7F4TqvNSpjDxPPyu4ceahCrJvOMeZV8WiZp5sBQXJxnO+hkEZ7721wZoZqLVsHE1FVfpAetxhBwVOxCVBIhQK+OEcIKwKcheV9o+kHXNY6Od8wBLhiBiXfRlCmaHsI00QVHjPWmZ4GUU2jqy8vrFIJkFFRBhJ7Y8WL2tHyNbTrQ==
  ubuntu@vpm110.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAoKTFOVAj5HE3nUWCBVdqx+1PNlZxNhO+7tbm6rMqNHPK9j3jWq5lRImjPAKgAIORE/6t+8oVYE9YlmoStHKwiD/rBSWw0P9FPrj2pylFcbtc4sQgLY+9L9H/eQ/AKzT2AhNpF5vV2s4slRGcfpMthMh23d4b0IJisGm4AVVasO3eKADHIpVKVmtG8e3QdPQDq4cUZvXQDpTWJ55VNmyNhQQfomh5lRUZWyOaXeW6iWBNsEA9C3TT04phDh2P5TJlSbjoP0DVen9MLPdSPzFZjk6VZBCM1OS72ahikHffOFPh2jKMLLxfqE5S3IeeqM9ZDjomL9Pk6Y9XPXZBZVCYqQ==
  ubuntu@vpm177.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAw/2n/nvv3ZOBDWxbh2/O/cEEzU/9r4H9pHeIjF6ATvbT4IOBgwgBH20S32Yvh0UYRBp8nqaTwpxDkqjZe2aYC+NCLgqh34Y1o9ludWq3VHr3kS7C0OWDwaXnVZHH6KX3JPj2q66ARYU0NcODkCXzNLgpvd97lA7fqqIUUyt2NLtM2S0tsaej96k2VtCImUhrDWZl4fj6YrtYWJhONQQydXS2HmSf2jAXvhQbEMZD/OYg5YuliRK4rz/fTICVodB7EayDiw7kqtk3902jfxTxQELgFADXDZqqRCOgb/3nwtUsjr/4M2+gaA2Z1EcQSaLtrSrYHMub8g/Ec9w7bov1mw==
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- print: '**** done dumpling install'
- ceph:
    fs: xfs
- parallel:
  - workload
- print: '**** done parallel'
- install.upgrade:
    client.0:
      branch: firefly
    mon.a:
      branch: firefly
    mon.b:
      branch: firefly
- print: '**** done install.upgrade'
- ceph.restart: null
- print: '**** done restart'
- parallel:
  - workload2
  - upgrade-sequence
- print: '**** done parallel'
- install.upgrade:
    client.0: null
- print: '**** done install.upgrade client.0 to the version from teuthology-suite
    arg'
- rados:
    clients:
    - client.0
    ec_pool: true
    objects: 500
    op_weights:
      append: 45
      delete: 10
      read: 45
      write: 0
    ops: 4000
- rados:
    clients:
    - client.1
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- workunit:
    clients:
      client.1:
      - rados/load-gen-mix.sh
- sequential:
  - mon_thrash:
      revive_delay: 20
      thrash_delay: 1
  - workunit:
      clients:
        client.1:
        - rados/test.sh
  - print: '**** done rados/test.sh - 6-final-workload'
- workunit:
    clients:
      client.1:
      - cls/test_cls_rbd.sh
- workunit:
    clients:
      client.1:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- rgw:
  - client.1
- s3tests:
    client.1:
      rgw_server: client.1
- swift:
    client.1:
      rgw_server: client.1
teuthology_branch: master
tube: vps
upgrade-sequence:
  sequential:
  - install.upgrade:
      mon.a: null
  - print: '**** done install.upgrade mon.a to the version from teuthology-suite arg'
  - install.upgrade:
      mon.b: null
  - print: '**** done install.upgrade mon.b to the version from teuthology-suite arg'
  - ceph.restart:
      daemons:
      - mon.a
  - sleep:
      duration: 60
  - ceph.restart:
      daemons:
      - mon.b
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.c
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.0
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.1
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.2
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.3
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
  - exec:
      mon.a:
      - ceph osd crush tunables firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.14187
workload:
  sequential:
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done rados/test.sh &  cls'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh'
workload2:
  sequential:
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done #rados/test.sh and cls 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh 2'
description: upgrade:dumpling-firefly-x/parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-firefly-upgrade/firefly.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml
  test_rbd_api.yaml test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-daemon.yaml
  6-final-workload/{ec-readwrite.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml
  rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml}
  distros/centos_6.5.yaml}
duration: 6377.855396032333
failure_reason: '''wait_until_healthy''reached maximum tries (150) after waiting for
  900 seconds'
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #9274: "AssertionError: failed to recover before timeout expired" in upgrade:dumpling-x:stress-split-master-distro-basic-vpsCan't reproduce08/28/2014

Actions
Actions #2

Updated by Samuel Just over 9 years ago

  • Priority changed from Normal to Urgent
Actions #3

Updated by Samuel Just over 9 years ago

Slow recovery, it was still actively recovering.

Actions #4

Updated by Samuel Just over 9 years ago

probably need to backport the timeout updates to the dumpling suite?

Actions #5

Updated by Samuel Just over 9 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF