Bug #9714: Dead jobs in upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi run - Ceph - Ceph

Actions

Copy link

Bug #9714

closed

Dead jobs in upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi run

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:

Duplicate

Priority:

High

Assignee:

Samuel Just

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Run http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-08_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi/

Dead jobs: ['534726', '534728', '534730', '534737', '534738', '534741', '534742']

Dead: 2014-10-09T03:51:11.965 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0

One for example, logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-08_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi/534726/

Last lines:

2014-10-09T03:51:05.869 INFO:tasks.thrashosds.thrasher:in_osds:  [0, 1, 2, 3, 4, 5]  out_osds:  [] dead_osds:  [] live_osds:  [1, 0, 3, 2, 5, 4]
2014-10-09T03:51:05.869 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0
2014-10-09T03:51:05.870 INFO:tasks.thrashosds.thrasher:Removing osd 0, in_osds are: [0, 1, 2, 3, 4, 5]
2014-10-09T03:51:05.870 INFO:teuthology.orchestra.run.mira046:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd out 0'
2014-10-09T03:51:06.956 INFO:teuthology.orchestra.run.mira046.stderr:marked out osd.0.
2014-10-09T03:51:11.965 INFO:tasks.thrashosds.thrasher:in_osds:  [1, 2, 3, 4, 5]  out_osds:  [0] dead_osds:  [] live_osds:  [1, 0, 3, 2, 5, 4]
2014-10-09T03:51:11.965 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0

archive_path: /var/lib/teuthworker/archive/teuthology-2014-10-08_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi/534726
branch: giant
description: upgrade:dumpling-firefly-x:stress-split/{00-cluster/start.yaml 01-dumpling-install/dumpling.yaml
  02-partial-upgrade-firefly/firsthalf.yaml 03-thrash/default.yaml 04-mona-upgrade-firefly/mona.yaml
  05-workload/rbd-cls.yaml 06-monb-upgrade-firefly/monb.yaml 07-workload/radosbench.yaml
  08-monc-upgrade-firefly/monc.yaml 09-workload/{rbd-python.yaml rgw-s3tests.yaml}
  10-osds-upgrade-firefly/secondhalf.yaml 11-workload/snaps-few-objects.yaml 12-partial-upgrade-x/first.yaml
  13-thrash/default.yaml 14-mona-upgrade-x/mona.yaml 15-workload/rbd-import-export.yaml
  16-monb-upgrade-x/monb.yaml 17-workload/readwrite.yaml 18-monc-upgrade-x/monc.yaml
  19-workload/radosbench.yaml 20-osds-upgrade-x/osds_secondhalf.yaml 21-final-workload/rados_stress_watch.yaml
  distros/ubuntu_14.04.yaml}
email: ceph-qa@ceph.com
job_id: '534726'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: plana,burnupi,mira
name: teuthology-2014-10-08_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi
nuke-on-error: true
os_type: ubuntu
os_version: '14.04'
overrides:
  admin_socket:
    branch: giant
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 3bfb5fab41b6247259183c3f52c786e35beb3b01
  ceph-deploy:
    branch:
      dev: giant
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 3bfb5fab41b6247259183c3f52c786e35beb3b01
  s3tests:
    branch: giant
  workunit:
    sha1: 3bfb5fab41b6247259183c3f52c786e35beb3b01
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
  - mon.c
- - osd.3
  - osd.4
  - osd.5
- - client.0
suite: upgrade:dumpling-firefly-x:stress-split
suite_branch: master
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_master
targets:
  ubuntu@mira046.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYIVIibHwBdEDJ25Owjw8QkSrlozG8FODNRxu1ttOagKkY3uaBnwVVQw0sLCHZi3n1O1nAWWRTfY69/4OPxJIRuFy6Jqz8dx9d6SHIZk1IwS+PUM1s2vJVY7cm3V3ibfQBmiyTD8ydRlKW8nmOMMHnMz5on1zNFgPgwEVziXdr0dmU5qakTwkUOchrHka/fH6CzAvHTmMANsWgpMek/Nqs2fxRF7/bufj4/4H8Et6AP2iF7mGIgE5beg+WLoXHE4mQdv5Zcs6FsDFiKpLSxrZFa6fx4VO1H0sRwbFdDVKuASH68HT+8eni6qvm+l2wYJHloAuYYbpH6xBMhMTW97WZ
  ubuntu@plana24.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC722lqzhbMaA+ku5+dLCoZyGcbaXA5/YIPr4oEHwmfjeRZIuARFjMpJaA/+TNdUACVxBP/vePdPYxurjBi0RggFa4YqvKoN8m1RcVZ8QiSPqDCBb+Og6Tjc7/NrRdP9wiJHwCqAhJ2Mgc94NX3oHs1WvASmeY1LI0B29ufDCSyR5p8MGxTWc4JggBEUHWI8jPEKrN+GxvLD/Ezya6t48TG3yN1BApJH8VzniCGf2J1IBoQ5vc8AnjtNYJCyTMhuX0aKOxIphyVEIJC3bz3VeyHfFNIoTJrXriIxhP6LfBF8UbQMhKPbiVpkJbqFqBmOMlgNCnex60fEpqO6DuI82bh
  ubuntu@plana27.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDC65VysYnJj+jrnAlVO9Ibdj56IzYapOaGvYU2pGAyCOZq6UbUkg0nGa+snNTIFBO640lsn+RFGEl8+DtxM6NKQeRW3GZKXf6+CtouEAg18qaU7wTdFu++A5Fp5SjIaTpFko1IDBuGelYfXfXmySIDMa54IorjevKPYHvO7soCxy/Y4lfotp/wna9xzr4QIKG4fQ4dC+wuQANT8gdfg/c7jSIq7sioHfs3Xg7dH/nKIg4KqLuMi7gc36tXLwWHWlpIzXR9WFMzlFSsRD1pIx8cyYv6rHSAj1vEizygMOaErqynioVVIE7UT+Qwp1HoJlShdsLwqFtxDseftRTQWzO3
tasks:
- internal.lock_machines:
  - 3
  - plana,burnupi,mira
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.push_inventory: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0:
      branch: firefly
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: dumpling
    clients:
      client.0:
      - cls/test_cls_rbd.sh
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0: null
    default_idle_timeout: 300
- s3tests:
    client.0:
      rgw_server: client.0
- install.upgrade:
    osd.3:
      branch: firefly
- ceph.restart:
    daemons:
    - osd.3
    - osd.4
    - osd.5
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    clients:
      client.0:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 10
      read: 45
      write: 45
    ops: 4000
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    osd.3: null
- ceph.restart:
    daemons:
    - osd.3
    - osd.4
    - osd.5
- workunit:
    clients:
      client.0:
      - rados/stress_watch.sh
teuthology_branch: master
tube: multi
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.3172

Actions

Copy link

Updated by Yuri Weinstein over 9 years ago

Same problem in run - http://pulpito.front.sepia.ceph.com/teuthology-2014-10-10_19:00:01-upgrade:dumpling-x-firefly-distro-basic-multi/

suite:upgrade:dumpling-x

Jobs: '537893', '537898', '537899', '537904', '537905', '537910', '537911'

One example logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-10_19:00:01-upgrade:dumpling-x-firefly-distro-basic-multi/537893/

2014-10-11T02:28:29.353 INFO:tasks.thrashosds.thrasher:in_osds:  [2, 3, 5, 1, 4]  out_osds:  [0] dead_osds:  [] live_osds:  [1, 0, 3, 2, 5, 4]
2014-10-11T02:28:29.353 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0
2014-10-11T02:28:29.353 INFO:tasks.thrashosds.thrasher:inject_pause on 3
2014-10-11T02:28:29.353 INFO:tasks.thrashosds.thrasher:Testing filestore_inject_stall pause injection for duration 3
2014-10-11T02:28:29.354 INFO:tasks.thrashosds.thrasher:Checking after 0, should_be_down=False
2014-10-11T02:28:29.354 INFO:teuthology.orchestra.run.mira115:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok config set filestore_inject_stall 3'
2014-10-11T02:28:34.439 INFO:tasks.thrashosds.thrasher:in_osds:  [2, 3, 5, 1, 4]  out_osds:  [0] dead_osds:  [] live_osds:  [1, 0, 3, 2, 5, 4]
2014-10-11T02:28:34.439 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0

Actions

Copy link