Bug #9787: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi - teuthology - Ceph

Actions

Copy link

Bug #9787

closed

Bug #9553: AssertionError "mon_thrash.py", line 143, in do_join" in upgrade:firefly-firefly-testing-basic-vps run

Bug #9627: ceph_manager.py is missing

"MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:

Duplicate

Priority:

High

Assignee:

Category:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

Looks similar to #9702

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-13_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi/546368/

014-10-14T22:02:20.581 INFO:teuthology.orchestra.run.burnupi34:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-10-14T22:02:20.745 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN pool data pg_num 34 > pgp_num 24
2014-10-14T22:02:21.745 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 54, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/ceph.py", line 1090, in restart
    healthy(ctx=ctx, config=None)
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/ceph.py", line 995, in healthy
    remote=mon0_remote,
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 822, in wait_until_healthy
    while proceed():
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 127, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds

archive_path: /var/lib/teuthworker/archive/teuthology-2014-10-13_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi/546368
branch: giant
description: upgrade:dumpling-firefly-x:stress-split/{00-cluster/start.yaml 01-dumpling-install/dumpling.yaml
  02-partial-upgrade-firefly/firsthalf.yaml 03-thrash/default.yaml 04-mona-upgrade-firefly/mona.yaml
  05-workload/readwrite.yaml 06-monb-upgrade-firefly/monb.yaml 07-workload/rbd_api.yaml
  08-monc-upgrade-firefly/monc.yaml 09-workload/{rbd-python.yaml rgw-s3tests.yaml}
  10-osds-upgrade-firefly/secondhalf.yaml 11-workload/snaps-few-objects.yaml 12-partial-upgrade-x/first.yaml
  13-thrash/default.yaml 14-mona-upgrade-x/mona.yaml 15-workload/rbd-import-export.yaml
  16-monb-upgrade-x/monb.yaml 17-workload/readwrite.yaml 18-monc-upgrade-x/monc.yaml
  19-workload/radosbench.yaml 20-osds-upgrade-x/osds_secondhalf.yaml 21-final-workload/rgw-swift.yaml
  distros/ubuntu_14.04.yaml}
email: ceph-qa@ceph.com
job_id: '546368'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: plana,burnupi,mira
name: teuthology-2014-10-13_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi
nuke-on-error: true
os_type: ubuntu
os_version: '14.04'
overrides:
  admin_socket:
    branch: giant
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 674781960b8856ae684520c3b0e9a6b8c2bc7bec
  ceph-deploy:
    branch:
      dev: giant
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 674781960b8856ae684520c3b0e9a6b8c2bc7bec
  s3tests:
    branch: giant
  workunit:
    sha1: 674781960b8856ae684520c3b0e9a6b8c2bc7bec
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
  - mon.c
- - osd.3
  - osd.4
  - osd.5
- - client.0
suite: upgrade:dumpling-firefly-x:stress-split
suite_branch: giant
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_giant
targets:
  ubuntu@burnupi34.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVHu1u8/oxlx4Gs/CzuGsF6R5obvz8zBIZJ2oW6ZlWn3da3ybaWDEY3rmRtCEpmFIXK5UKFRFEqlKcbDVbl3OB53a4SUcgLgH0YcVgab3zy4rp7SDdBXzGJK7aM7hhGiKY73O7pKpFLX8thRxNIzRBR1Rr49Re41WXfb/45fDl2tiGNMX0QgorKUtMCkeKv4C/NhG4g+pk0j2kur4QCUfFGGzcYJNlpGzmyBoe0g8UYtLAPKOBjpUHY4iDwe2hB36ifiW1T9WvJ3f7/axcZpFuFosdMEJJ3mrIOAeko4CpcV7lJVCT3S/Kj9KsyklLt682ni999dQ/RRHDQkqd0Qth
  ubuntu@burnupi58.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDGDu9GmokE8emaK7D1+nDTlqZV1YkO/l2lzTlBu3MWBgSiilStQlYCm7swRwfAiujm1tu9XyeIYfyhFfAGClbl21bKWYjUjp3HDifDRpGO6iOOWBXx8rk7tHiGsJV/A/6+3M7M9MLdHRWD0rxVOk58KxLnE7i+1TcPWZ0SeectH10oO5n8D/f0u8EHsNSnfw9dKMBIzfPAZl+KLi1ULVVd36KXi332ZmzNaaMx+OdKRl7DL2dyu7zPF6lfY4N3T+Ret1Rb0WcD+6yZXs/jvD8tAq+FHnLa5M0rcjGwOXF0qfxTYHrS34fahmNiTr4HE6WQxb4B/FlEKHOpAVfKa3fr
  ubuntu@mira115.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDgh+VAmGnwc68GBShVM5vhbsX68xB34mUrcikqlpsR+AdiRsJ/sTJX1NHgZtR2YYYNUkXEOck5JbCY6H1JYJ7c6t56CdMq6HA46ozL8aLUvpkebV8Ey3gYiDzUk/B7PmRgmW2AUYqFa2jOzkJDop7yMFgM6J2wq/ZbiuvJh1rV2IHQJv1OUjzGQP3cL6GeJVEWNzVZkmEGSn7EtMan7/8SB1KQ7qcRcp7Aol2V6lqB6LdqrQcDOAG3BGi7ssAEwowVrXM0t3f8VuxoEFYJg3rBaxqKyR3bh/uiFdhaOFHKRuOOek5vyHfBEHqiL3+3ogl89329t1W0LJH6gVY5tWAn
tasks:
- internal.lock_machines:
  - 3
  - plana,burnupi,mira
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.push_inventory: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0:
      branch: firefly
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 10
      read: 45
      write: 45
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    clients:
      client.0:
      - rbd/test_librbd.sh
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0: null
    default_idle_timeout: 300
- s3tests:
    client.0:
      rgw_server: client.0
- install.upgrade:
    osd.3:
      branch: firefly
- ceph.restart:
    daemons:
    - osd.3
    - osd.4
    - osd.5
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    clients:
      client.0:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 10
      read: 45
      write: 45
    ops: 4000
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    osd.3: null
- ceph.restart:
    daemons:
    - osd.3
    - osd.4
    - osd.5
- rgw:
    client.0: null
    default_idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
teuthology_branch: master
tube: multi
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.3181

description: upgrade:dumpling-firefly-x:stress-split/{00-cluster/start.yaml 01-dumpling-install/dumpling.yaml
  02-partial-upgrade-firefly/firsthalf.yaml 03-thrash/default.yaml 04-mona-upgrade-firefly/mona.yaml
  05-workload/readwrite.yaml 06-monb-upgrade-firefly/monb.yaml 07-workload/rbd_api.yaml
  08-monc-upgrade-firefly/monc.yaml 09-workload/{rbd-python.yaml rgw-s3tests.yaml}
  10-osds-upgrade-firefly/secondhalf.yaml 11-workload/snaps-few-objects.yaml 12-partial-upgrade-x/first.yaml
  13-thrash/default.yaml 14-mona-upgrade-x/mona.yaml 15-workload/rbd-import-export.yaml
  16-monb-upgrade-x/monb.yaml 17-workload/readwrite.yaml 18-monc-upgrade-x/monc.yaml
  19-workload/radosbench.yaml 20-osds-upgrade-x/osds_secondhalf.yaml 21-final-workload/rgw-swift.yaml
  distros/ubuntu_14.04.yaml}
duration: 2669.694764852524
failure_reason: '''wait_until_healthy''reached maximum tries (150) after waiting for
  900 seconds'
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Samuel Just over 9 years ago

I see the following in the log:

2014-10-14T22:02:27.978 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 117, in run_tasks
suppress = manager.__exit__(*exc_info)
File "/usr/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/thrashosds.py", line 172, in task
thrash_proc.do_join()
File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/ceph_manager.py", line 275, in do_join
self.thread.get()
File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
raise self._exception
CommandFailedError: Command failed on burnupi58 with status 1: 'sudo ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op list-pg

If this is the export/import stuff, we should disable it until it is more stable.

Actions

Copy link

Updated by Samuel Just over 9 years ago

Assignee set to David Zafman
Priority changed from Normal to High

Actions

Copy link

Updated by David Zafman over 9 years ago

Project changed from Ceph to teuthology
Assignee changed from David Zafman to Anonymous

The command was uninstalled from the machine while the test is running. Is this part of what happens during an upgrade?

2014-10-14T21:28:56.259 INFO:teuthology.orchestra.run.burnupi58:Running: 'sudo ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op list-pgs'
2014-10-14T21:28:56.322 INFO:teuthology.orchestra.run.burnupi58.stderr:sudo: ceph_objectstore_tool: command not found

Actions

Copy link

Updated by Anonymous over 9 years ago

Status changed from New to In Progress

I'm currently working on this. It's just a guess, but I think that this may be related to chef not being automatically installed.

Actions

Copy link

Updated by Anonymous over 9 years ago

This appears to be the same problem as 9627

Actions

Copy link

Updated by Anonymous over 9 years ago

Parent task set to #9627

Actions

Copy link

Updated by Anonymous over 9 years ago

Note to self: 9627 is also this bug. That was prompted by a test run trying to fix 9553. Yaml file ~/tests/t9553.yaml reproduces this. I have put in a few chef: lines to see if this fixes things. It takes a while to run.

Actions

Copy link

Updated by David Zafman over 9 years ago

Not a chef problem, since the ceph_objectstore_tool didn't exist in dumpling or firefly. It was introduced in giant. I'll mark this as duplicate and close other bug.

Actions

Copy link

Updated by David Zafman over 9 years ago

Status changed from In Progress to Duplicate

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Tools » teuthology

Custom queries

Bug #9787

"MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi

Updated by Samuel Just over 9 years ago

Updated by Samuel Just over 9 years ago

Updated by David Zafman over 9 years ago

Updated by Anonymous over 9 years ago

Updated by Anonymous over 9 years ago

Updated by Anonymous over 9 years ago

Updated by Anonymous over 9 years ago

Updated by David Zafman over 9 years ago

Updated by David Zafman over 9 years ago