Project

General

Profile

Actions

Bug #9787

closed

Bug #9553: AssertionError "mon_thrash.py", line 143, in do_join" in upgrade:firefly-firefly-testing-basic-vps run

Bug #9627: ceph_manager.py is missing

"MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Looks similar to #9702

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-13_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi/546368/

014-10-14T22:02:20.581 INFO:teuthology.orchestra.run.burnupi34:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-10-14T22:02:20.745 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN pool data pg_num 34 > pgp_num 24
2014-10-14T22:02:21.745 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 54, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/ceph.py", line 1090, in restart
    healthy(ctx=ctx, config=None)
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/ceph.py", line 995, in healthy
    remote=mon0_remote,
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 822, in wait_until_healthy
    while proceed():
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 127, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds
archive_path: /var/lib/teuthworker/archive/teuthology-2014-10-13_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi/546368
branch: giant
description: upgrade:dumpling-firefly-x:stress-split/{00-cluster/start.yaml 01-dumpling-install/dumpling.yaml
  02-partial-upgrade-firefly/firsthalf.yaml 03-thrash/default.yaml 04-mona-upgrade-firefly/mona.yaml
  05-workload/readwrite.yaml 06-monb-upgrade-firefly/monb.yaml 07-workload/rbd_api.yaml
  08-monc-upgrade-firefly/monc.yaml 09-workload/{rbd-python.yaml rgw-s3tests.yaml}
  10-osds-upgrade-firefly/secondhalf.yaml 11-workload/snaps-few-objects.yaml 12-partial-upgrade-x/first.yaml
  13-thrash/default.yaml 14-mona-upgrade-x/mona.yaml 15-workload/rbd-import-export.yaml
  16-monb-upgrade-x/monb.yaml 17-workload/readwrite.yaml 18-monc-upgrade-x/monc.yaml
  19-workload/radosbench.yaml 20-osds-upgrade-x/osds_secondhalf.yaml 21-final-workload/rgw-swift.yaml
  distros/ubuntu_14.04.yaml}
email: ceph-qa@ceph.com
job_id: '546368'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: plana,burnupi,mira
name: teuthology-2014-10-13_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi
nuke-on-error: true
os_type: ubuntu
os_version: '14.04'
overrides:
  admin_socket:
    branch: giant
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 674781960b8856ae684520c3b0e9a6b8c2bc7bec
  ceph-deploy:
    branch:
      dev: giant
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 674781960b8856ae684520c3b0e9a6b8c2bc7bec
  s3tests:
    branch: giant
  workunit:
    sha1: 674781960b8856ae684520c3b0e9a6b8c2bc7bec
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
  - mon.c
- - osd.3
  - osd.4
  - osd.5
- - client.0
suite: upgrade:dumpling-firefly-x:stress-split
suite_branch: giant
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_giant
targets:
  ubuntu@burnupi34.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVHu1u8/oxlx4Gs/CzuGsF6R5obvz8zBIZJ2oW6ZlWn3da3ybaWDEY3rmRtCEpmFIXK5UKFRFEqlKcbDVbl3OB53a4SUcgLgH0YcVgab3zy4rp7SDdBXzGJK7aM7hhGiKY73O7pKpFLX8thRxNIzRBR1Rr49Re41WXfb/45fDl2tiGNMX0QgorKUtMCkeKv4C/NhG4g+pk0j2kur4QCUfFGGzcYJNlpGzmyBoe0g8UYtLAPKOBjpUHY4iDwe2hB36ifiW1T9WvJ3f7/axcZpFuFosdMEJJ3mrIOAeko4CpcV7lJVCT3S/Kj9KsyklLt682ni999dQ/RRHDQkqd0Qth
  ubuntu@burnupi58.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDGDu9GmokE8emaK7D1+nDTlqZV1YkO/l2lzTlBu3MWBgSiilStQlYCm7swRwfAiujm1tu9XyeIYfyhFfAGClbl21bKWYjUjp3HDifDRpGO6iOOWBXx8rk7tHiGsJV/A/6+3M7M9MLdHRWD0rxVOk58KxLnE7i+1TcPWZ0SeectH10oO5n8D/f0u8EHsNSnfw9dKMBIzfPAZl+KLi1ULVVd36KXi332ZmzNaaMx+OdKRl7DL2dyu7zPF6lfY4N3T+Ret1Rb0WcD+6yZXs/jvD8tAq+FHnLa5M0rcjGwOXF0qfxTYHrS34fahmNiTr4HE6WQxb4B/FlEKHOpAVfKa3fr
  ubuntu@mira115.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDgh+VAmGnwc68GBShVM5vhbsX68xB34mUrcikqlpsR+AdiRsJ/sTJX1NHgZtR2YYYNUkXEOck5JbCY6H1JYJ7c6t56CdMq6HA46ozL8aLUvpkebV8Ey3gYiDzUk/B7PmRgmW2AUYqFa2jOzkJDop7yMFgM6J2wq/ZbiuvJh1rV2IHQJv1OUjzGQP3cL6GeJVEWNzVZkmEGSn7EtMan7/8SB1KQ7qcRcp7Aol2V6lqB6LdqrQcDOAG3BGi7ssAEwowVrXM0t3f8VuxoEFYJg3rBaxqKyR3bh/uiFdhaOFHKRuOOek5vyHfBEHqiL3+3ogl89329t1W0LJH6gVY5tWAn
tasks:
- internal.lock_machines:
  - 3
  - plana,burnupi,mira
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.push_inventory: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0:
      branch: firefly
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 10
      read: 45
      write: 45
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    clients:
      client.0:
      - rbd/test_librbd.sh
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0: null
    default_idle_timeout: 300
- s3tests:
    client.0:
      rgw_server: client.0
- install.upgrade:
    osd.3:
      branch: firefly
- ceph.restart:
    daemons:
    - osd.3
    - osd.4
    - osd.5
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    clients:
      client.0:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 10
      read: 45
      write: 45
    ops: 4000
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    osd.3: null
- ceph.restart:
    daemons:
    - osd.3
    - osd.4
    - osd.5
- rgw:
    client.0: null
    default_idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
teuthology_branch: master
tube: multi
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.3181
description: upgrade:dumpling-firefly-x:stress-split/{00-cluster/start.yaml 01-dumpling-install/dumpling.yaml
  02-partial-upgrade-firefly/firsthalf.yaml 03-thrash/default.yaml 04-mona-upgrade-firefly/mona.yaml
  05-workload/readwrite.yaml 06-monb-upgrade-firefly/monb.yaml 07-workload/rbd_api.yaml
  08-monc-upgrade-firefly/monc.yaml 09-workload/{rbd-python.yaml rgw-s3tests.yaml}
  10-osds-upgrade-firefly/secondhalf.yaml 11-workload/snaps-few-objects.yaml 12-partial-upgrade-x/first.yaml
  13-thrash/default.yaml 14-mona-upgrade-x/mona.yaml 15-workload/rbd-import-export.yaml
  16-monb-upgrade-x/monb.yaml 17-workload/readwrite.yaml 18-monc-upgrade-x/monc.yaml
  19-workload/radosbench.yaml 20-osds-upgrade-x/osds_secondhalf.yaml 21-final-workload/rgw-swift.yaml
  distros/ubuntu_14.04.yaml}
duration: 2669.694764852524
failure_reason: '''wait_until_healthy''reached maximum tries (150) after waiting for
  900 seconds'
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open1 closed)

Is duplicate of teuthology - Bug #9805: Error in "objectstore_tool.\\$pid.log --op list-pgs" in upgrade:firefly-x-giant-distro-basic-multi runResolvedYuri Weinstein10/17/2014

Actions
Actions #1

Updated by Samuel Just over 9 years ago

I see the following in the log:

2014-10-14T22:02:27.978 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 117, in run_tasks
suppress = manager.__exit__(*exc_info)
File "/usr/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/thrashosds.py", line 172, in task
thrash_proc.do_join()
File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/ceph_manager.py", line 275, in do_join
self.thread.get()
File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
raise self._exception
CommandFailedError: Command failed on burnupi58 with status 1: 'sudo ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op list-pg

If this is the export/import stuff, we should disable it until it is more stable.

Actions #2

Updated by Samuel Just over 9 years ago

  • Assignee set to David Zafman
  • Priority changed from Normal to High
Actions #3

Updated by David Zafman over 9 years ago

  • Project changed from Ceph to teuthology
  • Assignee changed from David Zafman to Anonymous

The command was uninstalled from the machine while the test is running. Is this part of what happens during an upgrade?

2014-10-14T21:28:56.259 INFO:teuthology.orchestra.run.burnupi58:Running: 'sudo ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op list-pgs'
2014-10-14T21:28:56.322 INFO:teuthology.orchestra.run.burnupi58.stderr:sudo: ceph_objectstore_tool: command not found

Actions #4

Updated by Anonymous over 9 years ago

  • Status changed from New to In Progress

I'm currently working on this. It's just a guess, but I think that this may be related to chef not being automatically installed.

Actions #5

Updated by Anonymous over 9 years ago

This appears to be the same problem as 9627

Actions #6

Updated by Anonymous over 9 years ago

  • Parent task set to #9627
Actions #7

Updated by Anonymous over 9 years ago

Note to self: 9627 is also this bug. That was prompted by a test run trying to fix 9553. Yaml file ~/tests/t9553.yaml reproduces this. I have put in a few chef: lines to see if this fixes things. It takes a while to run.

Actions #8

Updated by David Zafman over 9 years ago

Not a chef problem, since the ceph_objectstore_tool didn't exist in dumpling or firefly. It was introduced in giant. I'll mark this as duplicate and close other bug.

Actions #9

Updated by David Zafman over 9 years ago

  • Status changed from In Progress to Duplicate
Actions

Also available in: Atom PDF