Project

General

Profile

Actions

Bug #9326

closed

osd crash in upgrade:dumpling-firefly-x-master-distro-basic-vps suite

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-09-02_10:22:50-upgrade:dumpling-firefly-x-master-distro-basic-vps/466468/

Coredump in /remote//ceph-osd.1.log.gz:

ceph-osd.1.log.gz:2014-09-02 19:59:50.255193 7f462b756700 -1 *** Caught signal (Aborted) **
ceph-osd.1.log.gz: in thread 7f462b756700
ceph-osd.1.log.gz:
ceph-osd.1.log.gz: ceph version 0.84-940-g3215c52 (3215c520e1306f50d0094b5646636c02456c9df4)
ceph-osd.1.log.gz: 1: ceph-osd() [0xa9339f]
ceph-osd.1.log.gz: 2: (()+0x10340) [0x7f464c953340]
ceph-osd.1.log.gz: 3: (gsignal()+0x39) [0x7f464adf2bb9]
ceph-osd.1.log.gz: 4: (abort()+0x148) [0x7f464adf5fc8]
ceph-osd.1.log.gz: 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f464b6fe6b5]
ceph-osd.1.log.gz: 6: (()+0x5e836) [0x7f464b6fc836]
ceph-osd.1.log.gz: 7: (()+0x5e863) [0x7f464b6fc863]
ceph-osd.1.log.gz: 8: (()+0x5eaa2) [0x7f464b6fcaa2]
ceph-osd.1.log.gz: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xb7ca78]
ceph-osd.1.log.gz: 10: (ReplicatedPG::trim_object(hobject_t const&)+0x1eb) [0x85486b]
ceph-osd.1.log.gz: 11: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x501) [0x888131]
ceph-osd.1.log.gz: 12: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xb4) [0x8c0eb4]
ceph-osd.1.log.gz: 13: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x127) [0x8a7577]
ceph-osd.1.log.gz: 14: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x84) [0x8a7744]
ceph-osd.1.log.gz: 15: (ReplicatedPG::snap_trimmer()+0x5ec) [0x82152c]
ceph-osd.1.log.gz: 16: (OSD::SnapTrimWQ::_process(PG*)+0x1a) [0x6b2f9a]
ceph-osd.1.log.gz: 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa46) [0xb6da16]
ceph-osd.1.log.gz: 18: (ThreadPool::WorkThread::entry()+0x10) [0xb6eac0]
ceph-osd.1.log.gz: 19: (()+0x8182) [0x7f464c94b182]
ceph-osd.1.log.gz: 20: (clone()+0x6d) [0x7f464aeb6fbd]

archive_path: /var/lib/teuthworker/archive/teuthology-2014-09-02_10:22:50-upgrade:dumpling-firefly-x-master-distro-basic-vps/466468
branch: master
description: upgrade:dumpling-firefly-x/parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-firefly-upgrade/firefly.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml
  test_rbd_api.yaml test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-daemon.yaml
  6-final-workload/{ec-rados-default.yaml ec-rados-plugin=jerasure-k=3-m=1.yaml rados-snaps-few-objects.yaml
  rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml
  rgw_s3tests.yaml rgw_swift.yaml} distros/ubuntu_14.04.yaml}
email: ceph-qa@ceph.com
job_id: '466468'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-09-02_10:22:50-upgrade:dumpling-firefly-x-master-distro-basic-vps
nuke-on-error: true
os_type: ubuntu
os_version: '14.04'
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        osd heartbeat grace: 100
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - scrub mismatch
    - ScrubResult
    sha1: 3215c520e1306f50d0094b5646636c02456c9df4
  ceph-deploy:
    branch:
      dev: master
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 3215c520e1306f50d0094b5646636c02456c9df4
  rgw:
    default_idle_timeout: 1200
  s3tests:
    branch: master
    idle_timeout: 1200
  workunit:
    sha1: 3215c520e1306f50d0094b5646636c02456c9df4
owner: scheduled_teuthology@teuthology
priority: 1
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
  - client.1
suite: upgrade:dumpling-firefly-x
suite_branch: master
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_master
targets:
  ubuntu@vpm035.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDG5EVO5jsYHALBwYjXBu5vvUVknRWxo3EAj6ARqudBieL+AakQllcEPiqtfRO+tDouVReE3rxt0iqHKwhZyOG3edKvL+NcBtPb0kpJHdxJGnuF5lJFFsGlpXv6rTfFEdnFh3DfUhXgZ7U4qUE5wwZs/pLJta0gQMAMTlAcTXyIXgkm1ur6TpXaYptqTYggnvRuA464EvJLWw9xCoN1ZfRaWLNMqfv5Sm8SFtGzlE9069wU3GM3lqdV3ti3LpBFRDa5Y0+A+J7d5K0eKCIi72xP1ii8bVFSgmw8SlWRMc7sMHBoZ2CtFmB9/QR69BwXu/2zvtAPpwSFj5EWk35tsh/D
  ubuntu@vpm060.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8XfXjviCPZ4NC7VkeydE3t/gwAICBrP+SN8S5fGXMLPSfPnkoEzYgDQJN5FBDRU5/n27Zv4tcNF6sOltodkdwW01j/VxjdIagePFlYXzmFDMOBV8UZX5J5GCfAbf251xi8PVsiEezv3GPdswFolXgOzb3nQNlKH0HoKC7ef0Mq5oD1kpzMO8TtYWjdB1stfwRzZwfW6mvgwTIL91fpgx0rP5e6BE5TPbM2XoIT63I8rl/v6xXdlgOWGJ9nSAH79whvryut/t/U2sDd5YA8Kogh0Fa/7UZDffoiG3UhfJTWd2artsEJXjAyjZk/PuT6CfpSIbXvlu+iajvBYwKtZRZ
  ubuntu@vpm065.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQClPc5gHBkg6Hb0M41RNSRz76dTgnoATuvRSkXOo7tozQxQNapsAh0hBSgsgWjPbryjE47lBjhtEHPksVGLKY2rWh8DeIJT1uU+Mw1LnW2qUE7QRIqP6a+t9rrrHkrHgWkCetwV7A7KQn9/eMYbl4XV8LqLjDC9uLinYAF8UzXp+H4/qDRTowzO9Rm3H3KEiRQUilQg3nvGRVDllHWCgbF+3s89cIVacIWG4HK63unl5YrwySw7QPzS50/5VtLcjse5LgumvymNuP2kdxQD1GZXltPwej2KimZh1vY4zoAZ7MJ8fTiYOMu57/ESdPkuXcNJU7ArLrPdKjOc455unsvX
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- print: '**** done dumpling install'
- ceph:
    fs: xfs
- parallel:
  - workload
- print: '**** done parallel'
- install.upgrade:
    client.0:
      branch: firefly
    mon.a:
      branch: firefly
    mon.b:
      branch: firefly
- print: '**** done install.upgrade'
- ceph.restart: null
- print: '**** done restart'
- parallel:
  - workload2
  - upgrade-sequence
- print: '**** done parallel'
- install.upgrade:
    client.0: null
- print: '**** done install.upgrade client.0 to the version from teuthology-suite
    arg'
- rados:
    clients:
    - client.0
    ec_pool: true
    objects: 50
    op_weights:
      append: 100
      copy_from: 50
      delete: 50
      read: 100
      rmattr: 25
      rollback: 50
      setattr: 25
      snap_create: 50
      snap_remove: 50
      write: 0
    ops: 4000
- rados:
    clients:
    - client.0
    ec_pool: true
    erasure_code_profile:
      k: 3
      m: 1
      name: jerasure31profile
      plugin: jerasure
      ruleset-failure-domain: osd
      technique: reed_sol_van
    objects: 50
    op_weights:
      append: 100
      copy_from: 50
      delete: 50
      read: 100
      rmattr: 25
      rollback: 50
      setattr: 25
      snap_create: 50
      snap_remove: 50
      write: 0
    ops: 4000
- rados:
    clients:
    - client.1
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- workunit:
    clients:
      client.1:
      - rados/load-gen-mix.sh
- sequential:
  - mon_thrash:
      revive_delay: 20
      thrash_delay: 1
  - workunit:
      clients:
        client.1:
        - rados/test.sh
  - print: '**** done rados/test.sh - 6-final-workload'
- workunit:
    clients:
      client.1:
      - cls/test_cls_rbd.sh
- workunit:
    clients:
      client.1:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- rgw:
  - client.1
- s3tests:
    client.1:
      rgw_server: client.1
- swift:
    client.1:
      rgw_server: client.1
teuthology_branch: master
tube: vps
upgrade-sequence:
  sequential:
  - install.upgrade:
      mon.a: null
  - print: '**** done install.upgrade mon.a to the version from teuthology-suite arg'
  - install.upgrade:
      mon.b: null
  - print: '**** done install.upgrade mon.b to the version from teuthology-suite arg'
  - ceph.restart:
      daemons:
      - mon.a
  - sleep:
      duration: 60
  - ceph.restart:
      daemons:
      - mon.b
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.c
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.0
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.1
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.2
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.3
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
  - exec:
      mon.a:
      - ceph osd crush tunables firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.865
workload:
  sequential:
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done rados/test.sh &  cls'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh'
workload2:
  sequential:
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done #rados/test.sh and cls 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh 2'
description: upgrade:dumpling-firefly-x/parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-firefly-upgrade/firefly.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml
  test_rbd_api.yaml test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-daemon.yaml
  6-final-workload/{ec-rados-default.yaml ec-rados-plugin=jerasure-k=3-m=1.yaml rados-snaps-few-objects.yaml
  rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml
  rgw_s3tests.yaml rgw_swift.yaml} distros/ubuntu_14.04.yaml}
duration: 13652.1409740448
failure_reason: 'Command failed on vpm065 with status 1: ''sudo adjust-ulimits ceph-coverage
  /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 1'''
flavor: basic
owner: scheduled_teuthology@teuthology
success: false
Actions #1

Updated by Samuel Just over 9 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Samuel Just over 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Samuel Just

Note to sam: DOBjectMap xattr lookup/header lookup race due to no longer having per-collection locks?

Actions #3

Updated by Samuel Just over 9 years ago

  • Status changed from In Progress to 7
Actions #4

Updated by Samuel Just over 9 years ago

  • Status changed from 7 to Fix Under Review
Actions #5

Updated by Samuel Just over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Samuel Just over 9 years ago

  • Status changed from Pending Backport to Resolved

Does not need to be backported!

Actions

Also available in: Atom PDF