Project

General

Profile

Actions

Bug #9617

closed

objecter shutdown races with msg dispatch

Added by Yuri Weinstein over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
giant
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-09-27_19:10:02-upgrade:firefly-giant-x:parallel-giant-distro-basic-multi/515525/

Coredump in teuthology@teuthology:/a/teuthology-2014-09-27_19:10:02-upgrade:firefly-giant-x:parallel-giant-distro-basic-multi/515525/remote/burnupi22/log/ceph-client.admin.23324.log.gz:

ceph-client.admin.23324.log.gz:2014-09-27 22:51:38.749025 7f79777fe700 -1 *** Caught signal (Aborted) **
ceph-client.admin.23324.log.gz: in thread 7f79777fe700
ceph-client.admin.23324.log.gz:
ceph-client.admin.23324.log.gz: ceph version 0.85-931-gf8ac224 (f8ac2248af8a7f04094c6d2f8844e928212ff6b0)
ceph-client.admin.23324.log.gz: 1: rbd() [0x42382a]
ceph-client.admin.23324.log.gz: 2: (()+0x10340) [0x7f7982794340]
ceph-client.admin.23324.log.gz: 3: (gsignal()+0x39) [0x7f7981cd5f89]
ceph-client.admin.23324.log.gz: 4: (abort()+0x148) [0x7f7981cd9398]
ceph-client.admin.23324.log.gz: 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f79822db6b5]
ceph-client.admin.23324.log.gz: 6: (()+0x5e836) [0x7f79822d9836]
ceph-client.admin.23324.log.gz: 7: (()+0x5e863) [0x7f79822d9863]
ceph-client.admin.23324.log.gz: 8: (()+0x5eaa2) [0x7f79822d9aa2]
ceph-client.admin.23324.log.gz: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0x7f7982c4f5a8]
ceph-client.admin.23324.log.gz: 10: (Objecter::handle_osd_map(MOSDMap*)+0x1b99) [0x7f79851426e9]
ceph-client.admin.23324.log.gz: 11: (Objecter::ms_dispatch(Message*)+0x1df) [0x7f7985146dcf]
ceph-client.admin.23324.log.gz: 12: (DispatchQueue::entry()+0x649) [0x7f7982d5c6c9]
ceph-client.admin.23324.log.gz: 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f7982dd6dfd]
ceph-client.admin.23324.log.gz: 14: (()+0x8182) [0x7f798278c182]
ceph-client.admin.23324.log.gz: 15: (clone()+0x6d) [0x7f7981d9a38d]
ceph-client.admin.23324.log.gz: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
archive_path: /var/lib/teuthworker/archive/teuthology-2014-09-27_19:10:02-upgrade:firefly-giant-x:parallel-giant-distro-basic-multi/515525
branch: giant
description: upgrade:firefly-giant-x:parallel/{0-cluster/start.yaml 1-firefly-install/firefly.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-giant-upgrade/giant.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml
  test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-type.yaml 6-final-workload/{ec-rados-default.yaml
  ec-rados-plugin=jerasure-k=3-m=1.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml
  rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml}
  distros/ubuntu_14.04.yaml}
email: ceph-qa@ceph.com
job_id: '515525'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: plana,burnupi,mira
name: teuthology-2014-09-27_19:10:02-upgrade:firefly-giant-x:parallel-giant-distro-basic-multi
nuke-on-error: true
os_type: ubuntu
os_version: '14.04'
overrides:
  admin_socket:
    branch: giant
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - scrub mismatch
    - ScrubResult
    sha1: f8ac2248af8a7f04094c6d2f8844e928212ff6b0
  ceph-deploy:
    branch:
      dev: giant
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: f8ac2248af8a7f04094c6d2f8844e928212ff6b0
  s3tests:
    branch: giant
  workunit:
    sha1: f8ac2248af8a7f04094c6d2f8844e928212ff6b0
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
  - client.1
suite: upgrade:firefly-giant-x:parallel
suite_branch: master
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_master
targets:
  ubuntu@burnupi22.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCvcs33PS2MQMxrCXm6jPOz1tz2i0bGA+8uVmTWe7Iu2W+fsHonGTMHJxPVYRAMe8WDulEYtkwK64s9D/Ph38kRK0o62SA599NKVIPvh1LzZadkWCX6aKlLv6cQYvQxOaUBuAlOIKQ5h0IKxu2lBPCHg6CIRHLUCYmTcR2PzXahMS9ToGMq+NS36+/4HmYPQ80lJcf1D9J+m6ETVMZ+DDcV4B6DWyImczwfNvXJY/Mj10bh4ZyEt5MSTVqfFKqNa3K1fWDzWUfsx8G3QGnyXNwuhUfRBslkIn4bFM2oJvGpFqZOSPGEskjM3IZaqhcoydnYDTTIWHG/8K5WJ70ZBhdx
  ubuntu@plana05.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxJrg2btWDq5DMrIJsaWpxVqefw6LLKB5JDpN/ZjPDqfnYdHbUqAF8pMsbiT2E36BYQmN51l1I1mFZpm3a0V1DTqweHzwFfNZQPu8w+h1QP1uD2c0ZCfKGne/79KKJlTpAu310zJUZpiVqrD5MXg+kk9/9Ew/k6hpiiWqr2Hi+MZY4gpiqXI4ixZgJt1k+VLBHtRpp+DeD0Nzr3e5wfUDveBKRw6UORE2jY0ikM9KoTaAeO7q8LsRNBJO8tH4r7pAHdRUUp5WwKKlSKR/c8vBGViLSb1dAqx0bRpil9dQZmeP0uh89tOI3rNwZQaoPMvaA1kAD3ZzZKfRS3g/6hGXl
  ubuntu@plana44.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCqsl+WJBxEyj2KSYSIKGrxRE0KCq12QhJVsYHBtFVsnJrhnYL8O2HRbFU/H9ZFjz1KxjfthVqWcpxlxjKg62AixSGGyM7Jr5lScGCr/ujUYN4Nm1UWtZ7enDghLCSXui75dTBoFHcfBT2GklgPvphd8OdBL3w3KoGquYtxDlcvunoQy0oBQ8zSz5VAZRz3VZtDY4ysHoyqWumWnVul/RRQgFqlut5jv7p8sdwPmha1EYJXWwUlaftXl1SIUJcGRysiLKNEJ2SylJanmdGmb9e9Hbn73Ty9+DvgbaCwEQjJ5EnxDHzow3It5Q1BcjryhjUiaS0VbIfXfVMtU3ryaIfh
tasks:
- internal.lock_machines:
  - 3
  - plana,burnupi,mira
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.push_inventory: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: firefly
- print: '**** done firefly install'
- ceph:
    fs: xfs
- parallel:
  - workload
- print: '**** done parallel'
- install.upgrade:
    client.0:
      branch: giant
    mon.a:
      branch: giant
    mon.b:
      branch: giant
- print: '**** done install.upgrade'
- ceph.restart: null
- print: '**** done restart'
- parallel:
  - workload2
  - upgrade-sequence
- print: '**** done parallel 2'
- install.upgrade:
    client.0: null
- print: '**** done install.upgrade client.0 to the version from teuthology-suite
    arg'
- rados:
    clients:
    - client.0
    ec_pool: true
    erasure_code_profile:
      k: 3
      m: 1
      name: jerasure31profile
      plugin: jerasure
      ruleset-failure-domain: osd
      technique: reed_sol_van
    objects: 50
    op_weights:
      append: 100
      copy_from: 50
      delete: 50
      read: 100
      rmattr: 25
      rollback: 50
      setattr: 25
      snap_create: 50
      snap_remove: 50
      write: 0
    ops: 4000
- rados:
    clients:
    - client.1
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- workunit:
    clients:
      client.1:
      - rados/load-gen-mix.sh
- sequential:
  - mon_thrash:
      revive_delay: 20
      thrash_delay: 1
  - workunit:
      clients:
        client.1:
        - rados/test.sh
  - print: '**** done rados/test.sh - 6-final-workload'
- workunit:
    clients:
      client.1:
      - cls/test_cls_rbd.sh
- workunit:
    clients:
      client.1:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- rgw:
  - client.1
- s3tests:
    client.1:
      rgw_server: client.1
- swift:
    client.1:
      rgw_server: client.1
teuthology_branch: master
tube: multi
upgrade-sequence:
  sequential:
  - install.upgrade:
      mon.a: null
  - print: '**** done install.upgrade mon.a to the version from teuthology-suite arg'
  - ceph.restart:
      daemons:
      - mon.a
      wait-for-healthy: true
  - sleep:
      duration: 60
  - ceph.restart:
      daemons:
      - osd.0
      - osd.1
      wait-for-healthy: true
  - sleep:
      duration: 60
  - print: '**** running mixed versions of osds and mons'
  - exec:
      mon.a:
      - ceph osd crush tunables firefly
  - install.upgrade:
      mon.b: null
  - print: '**** done install.upgrade mon.b to the version from teuthology-suite arg'
  - ceph.restart:
      daemons:
      - mon.b
      - mon.c
      wait-for-healthy: true
  - sleep:
      duration: 60
  - ceph.restart:
      daemons:
      - osd.2
      - osd.3
      wait-for-healthy: true
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
  - sleep:
      duration: 60
  - exec:
      mon.a:
      - ceph osd crush tunables firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.3118
workload:
  sequential:
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done rados/test.sh &  cls'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh'
  - rados:
      clients:
      - client.0
      ec_pool: true
      objects: 50
      op_weights:
        append: 100
        copy_from: 50
        delete: 50
        read: 100
        rmattr: 25
        rollback: 50
        setattr: 25
        snap_create: 50
        snap_remove: 50
        write: 0
      ops: 4000
workload2:
  sequential:
  - workunit:
      branch: giant
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done #rados/test.sh and cls 2'
  - workunit:
      branch: giant
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh 2'
  - workunit:
      branch: giant
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh 2'
  - workunit:
      branch: giant
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh 2'
description: upgrade:firefly-giant-x:parallel/{0-cluster/start.yaml 1-firefly-install/firefly.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-giant-upgrade/giant.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml
  test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-type.yaml 6-final-workload/{ec-rados-default.yaml
  ec-rados-plugin=jerasure-k=3-m=1.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml
  rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml}
  distros/ubuntu_14.04.yaml}
duration: 14013.186798095703
failure_reason: Found coredumps on ubuntu@burnupi22.front.sepia.ceph.com
flavor: basic
owner: scheduled_teuthology@teuthology
success: false
Actions #1

Updated by Sage Weil over 9 years ago

2014-09-27T22:51:38.729 INFO:tasks.workunit.client.1.burnupi22.stderr:+ rbd import foo.23322 foo.dir
2014-09-27T22:51:38.747 INFO:tasks.workunit.client.1.burnupi22.stderr:rbd: cannot import a directory
2014-09-27T22:51:38.747 INFO:tasks.workunit.client.1.burnupi22.stderr:^MImporting image: 0% complete...failed.
2014-09-27T22:51:38.747 INFO:tasks.workunit.client.1.burnupi22.stderr:rbd: import failed: (21) Is a directory
2014-09-27T22:51:38.762 INFO:tasks.workunit.client.1.burnupi22.stderr:osdc/Objecter.cc: In function 'void Objecter::handle_osd_map(MOSDMap*)' thread 7f79777fe700 time 2014-09-27 22:51:38.746613
2014-09-27T22:51:38.762 INFO:tasks.workunit.client.1.burnupi22.stderr:osdc/Objecter.cc: 709: FAILED assert(initialized.read())
2014-09-27T22:51:38.762 INFO:tasks.workunit.client.1.burnupi22.stderr: ceph version 0.85-931-gf8ac224 (f8ac2248af8a7f04094c6d2f8844e928212ff6b0)
2014-09-27T22:51:38.763 INFO:tasks.workunit.client.1.burnupi22.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f7982c4f3bb]
2014-09-27T22:51:38.763 INFO:tasks.workunit.client.1.burnupi22.stderr: 2: (Objecter::handle_osd_map(MOSDMap*)+0x1b99) [0x7f79851426e9]
2014-09-27T22:51:38.763 INFO:tasks.workunit.client.1.burnupi22.stderr: 3: (Objecter::ms_dispatch(Message*)+0x1df) [0x7f7985146dcf]
2014-09-27T22:51:38.763 INFO:tasks.workunit.client.1.burnupi22.stderr: 4: (DispatchQueue::entry()+0x649) [0x7f7982d5c6c9]
2014-09-27T22:51:38.763 INFO:tasks.workunit.client.1.burnupi22.stderr: 5: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f7982dd6dfd]
2014-09-27T22:51:38.764 INFO:tasks.workunit.client.1.burnupi22.stderr: 6: (()+0x8182) [0x7f798278c182]
2014-09-27T22:51:38.764 INFO:tasks.workunit.client.1.burnupi22.stderr: 7: (clone()+0x6d) [0x7f7981d9a38d]
2014-09-27T22:51:38.764 INFO:tasks.workunit.client.1.burnupi22.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-09-27T22:51:38.764 INFO:tasks.workunit.client.1.burnupi22.stderr:2014-09-27 22:51:38.747319 7f79777fe700 -1 osdc/Objecter.cc: In function 'void Objecter::handle_osd_map(MOSDMap*)' thread 7f79777fe700 time 2014-09-27 22:51:38.746613
2014-09-27T22:51:38.764 INFO:tasks.workunit.client.1.burnupi22.stderr:osdc/Objecter.cc: 709: FAILED assert(initialized.read())
2014-09-27T22:51:38.764 INFO:tasks.workunit.client.1.burnupi22.stderr:
2014-09-27T22:51:38.764 INFO:tasks.workunit.client.1.burnupi22.stderr: ceph version 0.85-931-gf8ac224 (f8ac2248af8a7f04094c6d2f8844e928212ff6b0)
2014-09-27T22:51:38.765 INFO:tasks.workunit.client.1.burnupi22.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f7982c4f3bb]
2014-09-27T22:51:38.765 INFO:tasks.workunit.client.1.burnupi22.stderr: 2: (Objecter::handle_osd_map(MOSDMap*)+0x1b99) [0x7f79851426e9]
2014-09-27T22:51:38.765 INFO:tasks.workunit.client.1.burnupi22.stderr: 3: (Objecter::ms_dispatch(Message*)+0x1df) [0x7f7985146dcf]
2014-09-27T22:51:38.765 INFO:tasks.workunit.client.1.burnupi22.stderr: 4: (DispatchQueue::entry()+0x649) [0x7f7982d5c6c9]
2014-09-27T22:51:38.765 INFO:tasks.workunit.client.1.burnupi22.stderr: 5: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f7982dd6dfd]
2014-09-27T22:51:38.765 INFO:tasks.workunit.client.1.burnupi22.stderr: 6: (()+0x8182) [0x7f798278c182]
2014-09-27T22:51:38.765 INFO:tasks.workunit.client.1.burnupi22.stderr: 7: (clone()+0x6d) [0x7f7981d9a38d]
2014-09-27T22:51:38.765 INFO:tasks.workunit.client.1.burnupi22.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #2

Updated by Sage Weil over 9 years ago

  • Project changed from Ceph to rbd
  • Priority changed from Normal to Urgent
Actions #3

Updated by Sage Weil over 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Zheng Yan
Actions #4

Updated by Josh Durgin over 9 years ago

  • Status changed from In Progress to 7
  • Assignee changed from Zheng Yan to Josh Durgin

wip-objecter-shutdown

Actions #5

Updated by Josh Durgin over 9 years ago

  • Status changed from 7 to Fix Under Review
Actions #6

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to giant
Actions #7

Updated by Sage Weil over 9 years ago

  • Project changed from rbd to Ceph
  • Subject changed from Coredump in upgrade:firefly-giant-x:parallel-giant-distro-basic-multi run to objecter shutdown races with msg dispatch
Actions #9

Updated by Loïc Dachary about 9 years ago

d790833 Objecter: check the 'initialized' atomic_t safely (in giant),

Actions #10

Updated by Loïc Dachary about 9 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF