Project

General

Profile

Bug #9158

osd crashed in upgrade:dumpling-x:stress-split-master-distro-basic-vps suite

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-17_11:40:01-upgrade:dumpling-x:stress-split-master-distro-basic-vps/431081/

Coredump in ceph-osd.3.log.gz:

ceph-osd.3.log.gz:2014-08-18 01:16:57.278351 7f3d702f9700 -1 *** Caught signal (Aborted) **
ceph-osd.3.log.gz: in thread 7f3d702f9700
ceph-osd.3.log.gz:
ceph-osd.3.log.gz: ceph version 0.83-777-g5045c5c (5045c5cb4c880255a1a5577c09b89d4be225bee9)
ceph-osd.3.log.gz: 1: ceph-osd() [0xa4f37f]
ceph-osd.3.log.gz: 2: (()+0x10340) [0x7f3d89396340]
ceph-osd.3.log.gz: 3: (gsignal()+0x39) [0x7f3d87835f89]
ceph-osd.3.log.gz: 4: (abort()+0x148) [0x7f3d87839398]
ceph-osd.3.log.gz: 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3d881416b5]
ceph-osd.3.log.gz: 6: (()+0x5e836) [0x7f3d8813f836]
ceph-osd.3.log.gz: 7: (()+0x5e863) [0x7f3d8813f863]
ceph-osd.3.log.gz: 8: (()+0x5eaa2) [0x7f3d8813faa2]
ceph-osd.3.log.gz: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0xb33ea2]
ceph-osd.3.log.gz: 10: (PG::RecoveryState::Stray::react(PG::MLogRec const&)+0x74a) [0x7a243a]
ceph-osd.3.log.gz: 11: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x1f4) [0x7da994]
ceph-osd.3.log.gz: 12: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7c5e7b]
ceph-osd.3.log.gz: 13: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x77602e]
ceph-osd.3.log.gz: 14: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2b0) [0x67d670]
ceph-osd.3.log.gz: 15: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x6d1558]
ceph-osd.3.log.gz: 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0xb24cf1]
ceph-osd.3.log.gz: 17: (ThreadPool::WorkThread::entry()+0x10) [0xb25de0]
ceph-osd.3.log.gz: 18: (()+0x8182) [0x7f3d8938e182]
ceph-osd.3.log.gz: 19: (clone()+0x6d) [0x7f3d878fa38d]
ceph-osd.3.log.gz: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
archive_path: /var/lib/teuthworker/archive/teuthology-2014-08-17_11:40:01-upgrade:dumpling-x:stress-split-master-distro-basic-vps/431081
branch: master
description: upgrade:dumpling-x:stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-cls.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml}
email: ceph-qa@ceph.com
job_id: '431081'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-08-17_11:40:01-upgrade:dumpling-x:stress-split-master-distro-basic-vps
nuke-on-error: true
os_type: ubuntu
os_version: '14.04'
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        osd heartbeat grace: 100
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 5045c5cb4c880255a1a5577c09b89d4be225bee9
  ceph-deploy:
    branch:
      dev: master
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 5045c5cb4c880255a1a5577c09b89d4be225bee9
  rgw:
    default_idle_timeout: 1200
  s3tests:
    branch: master
  workunit:
    sha1: 5045c5cb4c880255a1a5577c09b89d4be225bee9
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - mon.c
- - client.0
suite: upgrade:dumpling-x:stress-split
suite_branch: master
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_master
targets:
  ubuntu@vpm066.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDpua5fJVADDjEAiFjaqvgzq/dA8ZXg9UzziMHdYt5F3PiAM0YDOvpRhTeclzFVvEfPXcucZXPa4pNxXAxL/Vdcm64gDfVP/v19UcvxOCFB3g+rAFNbt/TL2nn2lK0n0cVh7NQzcSruKIJDwAGUJ0dtGHFfSywowQzyYXUqM1BGEidiBshboOFRVk8vicTdpi1nq09hBb714/T3aFPi+tDghk3Z9mC6V6I5ZOXjYIHxlwCopricLDIQkXiFpoZ+RUDtoF/cW9JKVTyiY8CqMmXugmRm6AE1BYmJPIUlKYxUnqLPApRnfmxFT97tvUmN5zaqQWs2wFa++JRdiT9WjIKB
  ubuntu@vpm091.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD3CIi538OlwFU0NA0vGUtKgKsm4zgBw00gsQvlvnp0p8tIZrfjMJIRLrGxdvIviS1LO0A1FN9UJDZmKYCTULeaDcPYaWGT7tu0nSDrOd5FjEIydv8ONKOeGH09SH0TpFmxw+jEcEFpdew7v5BqYjlycRebb0IsrF8TaB8ql0WGZnsr8Wyf6Q5FvOa63VIxmHA5QkuLe08DXQIN5IFsaSCYzmCqoMKOdMn8NVCmngpBccgCbYatOtjEa7qx98GrfILDy5ZY01mQ1VbOfeTgNshm/1HGD1ygXlfnqO3Tq9uRDo8NwJ8N31C08mJYnfptw4UPob7bWms8PcSY8IximrP7
  ubuntu@vpm092.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPWK2go/07GO6XdoqzcVJPC/+MAKv3/4vhPxKSazdIS2XMB6IS9l+wIP5mHstXPiDKHzPHeygal81MkwisvciM34VllDICsdOb1uTmzdHRH97G5QYCy8QO3ugcrRm2KPw6OZ50f68xbg2CPRTm7BLnP2+VYjBNj3xEy4L37lhqLfJ+HUgR6IsaRUOL+X8qUgVHthEG1ejxlER5KPqRhy2WzAYg9aWUWz3Hyb5UsOgDFzyVRZs11xEnaAYHrPJg8w/4CTAZjFnjUtMUrJT4vz1qcSUWu+uD4Qe/WzrEnEU6DVn90TQCiSbC72uVtcNiCzYlsmes6cdq1rtYqUa4RDZX
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: dumpling
    clients:
      client.0:
      - cls/test_cls_rbd.sh
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0: null
    default_idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: master
tube: vps
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.4663
description: upgrade:dumpling-x:stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-cls.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml}
duration: 9113.755001068115
failure_reason: timed out waiting for admin_socket to appear after osd.3 restart
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues

Related to Ceph - Bug #8736: thrash and scrub combination lead to error Duplicate 07/03/2014

History

#1 Updated by Sage Weil over 9 years ago

  • Priority changed from Normal to Urgent

#2 Updated by Samuel Just over 9 years ago

  • Status changed from New to Duplicate

Also available in: Atom PDF