Project

General

Profile

Actions

Bug #2999

closed

osd: msgr crash in OSD::complete_notify

Added by Tamilarasi muthamizhan over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs: ubuntu@teuthology:/a/teuthology-2012-08-17_19:00:07-regression-master-testing-gcov/3549

     0> 2012-08-18 00:09:48.191296 7fbbd527b700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fbbd527b700

 ceph version 0.50-267-gae57db0 (commit:ae57db03a9287257f3034fb045c30d5b7edff468)
 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x81e0da]
 2: (()+0xfcb0) [0x7fbbe630acb0]
 3: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x264) [0x8e0a64]
 4: (SimpleMessenger::send_message(Message*, Connection*)+0x13) [0x8e5173]
 5: (OSD::complete_notify(void*, void*)+0x14d) [0x61802d]
 6: (OSD::ack_notification(entity_name_t&, void*, void*, ReplicatedPG*)+0x76) [0x6183f6]
 7: (ReplicatedPG::do_osd_op_effects(ReplicatedPG::OpContext*)+0x247e) [0x579d4e]
 8: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x6e3) [0x5b1f53]
 9: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x4650) [0x5b8a60]
 10: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x484) [0x6f4474]
 11: (OSD::dequeue_op(PG*)+0x304) [0x611aa4]
 12: (OSD::OpWQ::_process(PG*)+0x15) [0x677ef5]
 13: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x12) [0x66e4e2]
 14: (ThreadPool::worker()+0x4db) [0x8f376b]
 15: (ThreadPool::WorkThread::entry()+0x15) [0x66fa15]
 16: (Thread::_entry_func(void*)+0x12) [0x8e6212]
 17: (()+0x7e9a) [0x7fbbe6302e9a]
 18: (clone()+0x6d) [0x7fbbe46a64bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2012-08-17_19:00:07-regression-master-testing-gcov/3549$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 1fe5e9932156f6122c3b1ff6ba7541c27c86718c
nuke-on-error: true
overrides:
  ceph:
    conf:
      client:
        rbd cache: true
        rbd cache max dirty: 0
    coverage: true
    fs: ext4
    log-whitelist:
    - slow request
    sha1: ae57db03a9287257f3034fb045c30d5b7edff468
  workunit:
    sha1: ae57db03a9287257f3034fb045c30d5b7edff468
roles:
- - mon.a
  - osd.0
  - osd.1
  - osd.2
- - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana48.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC1ScpN+amKWlGYSJOt8sqzLTCXYaU5QTqB3bbss463/VRcDQwabXs6tib54CO6FzWaloSqMaPZchA4HeNXOwdQ9daBnG1b0vrqs0B5jZnCVzvK9AoWGDg0tm68PWYr1AtJcNyIsutywVRdzjA2nzSioKU59dKugVog/+pkoB0hGYvXo72pePMV00IrgMr9FSbDnxi3L9iJvi2LD0Pnecx6DMnaDob/T+X5y0piap5esjwIIq7wqvXuEJE9jdmxPfHo3ise2j9UA2SGI5b7HL3YOemo0zic7ukCMvlc8Ag5dQnjANTcj2eUJyagvzJgxGqhoxyAp/WpmaHZkvf0RasB
  ubuntu@plana62.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC+nI5/l38Kdw2W/qbEKrVMcnVdIxJG7hNnD7nnS3+Zx/uPiWrds26ZPrM5IY7D8Mf7sjBzUYbqsX9xGYMLLTQaeDwsZn/7RjjSg8zOS1aMP5F/AJzSQx4Nt37eLUsRHX3yA30/OQcl6sBgDjHyhSPcSuHWSnMmoy4pkDo3xpQMQMtxDG8gWq+to1hZwJbsiK9FdutEgPJg3inWM1WVc5L6NmRN2WQNEGT8HvtlBCWqX6/H/hLujQlbgyJAbeG4BriMV3gCIccJE833f/fN9KIzaMlD7qHTgWcaGk+LY84nUdNlTkNoX+L4m6WRY8/Pt9om2dOocsXyCwYLIS4heIDT
  ubuntu@plana65.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDTwfkF9asvpySXF/DOk10UkRDNtRwgGgLww/I/3E2r+JpsfYtW62TA1HMXjtB1g7SrcIolqCiiMd+5MIURIND94n76JiZ2o4DplLKIqUB6ys46gro7mwoeFnZNOuwdAA5bO4dfgeQ3yPtfIqpWTejkCB7ai/kG04C4ekz6EgplwtqWIfvXnij4fNaqvm3s/IxGhnO40DOGNwsAldEJo2fuJN8KHnYzsU/Dx5kJ85jQl2eQJI74VpMoh2Ge7+n9Q8rJhegfcHYPLJsX/Uyrf7Rtk1RfeTyZbIOSJIQDbQNepu278kvc9IEnFg3WfvWespfrUExVgnXq53xd1RIFcx6L
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 20000
ubuntu@teuthology:/a/teuthology-2012-08-17_19:00:07-regression-master-testing-gcov/3549$ cat summary.yaml 
ceph-sha1: ae57db03a9287257f3034fb045c30d5b7edff468
client.0-kernel-sha1: 1fe5e9932156f6122c3b1ff6ba7541c27c86718c
description: collection:rbd-thrash clusters:6-osd-3-machine.yaml fs:ext4.yaml thrashers:default.yaml
  workloads:rbd_fsx_cache_writethrough.yaml
duration: 7051.4660429954529
failure_reason: 'Command failed with status 1: ''/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage
  /tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper term /tmp/cephtest/binary/usr/local/bin/ceph-osd
  -f -i 2 -c /tmp/cephtest/ceph.conf'''
flavor: gcov
mds.a-kernel-sha1: 1fe5e9932156f6122c3b1ff6ba7541c27c86718c
mon.a-kernel-sha1: 1fe5e9932156f6122c3b1ff6ba7541c27c86718c
owner: scheduled_teuthology@teuthology
success: false

Actions #1

Updated by Sage Weil over 11 years ago

  • Project changed from rbd to Ceph
Actions #2

Updated by Sage Weil over 11 years ago

  • Subject changed from osd crash to osd: msgr crash in OSD::complete_notify
  • Priority changed from Normal to High
Actions #3

Updated by Sage Weil over 11 years ago

  • Status changed from New to Resolved

fixed this a while ago

Actions

Also available in: Atom PDF