Project

General

Profile

Actions

Bug #3270

closed

osd crash during rbd test run

Added by Tamilarasi muthamizhan over 11 years ago. Updated over 11 years ago.

Status:
Duplicate
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs: ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1570

2012-10-03 21:28:01.688931 7fe7ad7c1700 -1 *** Caught signal (Aborted) **
 in thread 7fe7ad7c1700

 ceph version 0.52-958-gdb7c419 (commit:db7c41934b6e894c7d5a01ddf1a3592744c3d73c)
 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x8498ea]
 2: (()+0xfcb0) [0x7fe7be850cb0]
 3: (gsignal()+0x35) [0x7fe7bcb30445]
 4: (abort()+0x17b) [0x7fe7bcb33bab]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe7bd47e69d]
 6: (()+0xb5846) [0x7fe7bd47c846]
 7: (()+0xb5873) [0x7fe7bd47c873]
 8: (()+0xb596e) [0x7fe7bd47c96e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x297) [0x9322c7]
 10: (ReplicatedPG::do_osd_op_effects(ReplicatedPG::OpContext*)+0x26e4) [0x58bdc4]
 11: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x6d3) [0x5c4893]
 12: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3194) [0x5c9fc4]
 13: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x52c) [0x707c8c]
 14: (OSD::dequeue_op(PG*)+0x40f) [0x625abf]
 15: (OSD::OpWQ::_process(PG*)+0x15) [0x68d285]
 16: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x12) [0x683352]
 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x73a) [0x923bda]
 18: (ThreadPool::WorkThread::entry()+0x18) [0x9275e8]
 19: (Thread::_entry_func(void*)+0x12) [0x915b02]
 20: (()+0x7e9a) [0x7fe7be848e9a]
 21: (clone()+0x6d) [0x7fe7bcbec4bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1570$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1
nuke-on-error: true
overrides:
  ceph:
    conf:
      client:
        rbd cache: true
      global:
        ms inject socket failures: 5000
    coverage: true
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c
  s3tests:
    branch: master
  workunit:
    sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c
roles:
- - mon.a
  - osd.0
  - osd.1
  - osd.2
- - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana31.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5J4n7rTsH+IMjGAu+EfhukuK5+zScoSaPIfXDOUU8LfvuI/3x8Luiyv9eRVwZgwuLBWZ/zorBbGZ+G2Iaxy3632AG/XE7cRZA9AxzZT+Qvm9D+BW+Uletgf92cttKMk7qwK3DetQwRKKl6AMv0SDpUff+nzqnJH6LMS8zoBPVXDHFM3Lup8h9H6DYEs1F/Zn8LVSw8hNiD279rg1n1hqWdItmnKBPKyC/qkRoPa6h7gDU6FPaBiNhuhBd0016XGrVwL7Y8gqoDBiArP+NDt1lcnbeiK43bFhqW+pYovOdIA2MJC6z+bkZDlOJdxoz9mDP0cJZBdB43v3UdbS1R+WT
  ubuntu@plana39.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDo+Kh24vRxeTQ6/n5PIIGuxrPHPRO/xMQlwoLHi7mR01cIXJMG5wet7mp2om3/5SZSDcLBHduDKrdWL142Sg5fC0zZPUggbxS7nz/UCjYBzMsOtHEUAU5Gs0KFopOCHXNEveK95ezsroMAD5+jS/IEpiooYCkrR3H+NSvUU0Ae352PlXqV0vamkYzyQyEMmhFE50ALhUXbKMve3d2mxJee5sqVZSBmQTbze9RKUA96t9iiwiheflXbN1i9WHlbBOIue5pZ5fM3/vqPWgaShfFpa0pT56QKJfjyFcDeCLOislo23E5qKAJOi5vn5BoYVtG3niNQpt/YbYGfDEHVeqt9
  ubuntu@plana46.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCrD6s9otJ5xCNH4nyv0iJu6AoqmlNTFd8D0X9RfFBnmOMrMBWU9kwsFzPIOsuJGbSYbA8LCtjWUwaWoXmbEFtTMitxaDXp47gbVNXknHq7TGZHkWWOwKKu+tlSQBpCVzO/rzBbvJ9fcG7tewq5XcIHz0IUXsUFuEuXR1HaTUJKic2twBpaeAGNvdd6IZ9Sz9TMkfiRV/aVdcHJ/yF8bsXi3pfRPR3puMK/Nyfq5Hz/aabQo1TSyK2o0weoWV7D8vD6S8f3D7p5/5ScBhL3zUcP85SsV47W+/hTFbU8kN1Grlv2sx0fVMB/TUB/UNVdsHKGn5Nv6zb/qMqBEx9nSeZ9
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 2000
ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1570$ cat summary.yaml 
ceph-sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c
client.0-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1
description: collection:rbd-thrash clusters:6-osd-3-machine.yaml fs:btrfs.yaml msgr-failures:few.yaml
  thrashers:default.yaml workloads:rbd_fsx_cache_writeback.yaml
duration: 1793.8868980407715
failure_reason: 'Command failed with status 1: ''/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage
  /tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper term /tmp/cephtest/binary/usr/local/bin/ceph-osd
  -f -i 4 -c /tmp/cephtest/ceph.conf'''
flavor: gcov
mds.a-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1
mon.a-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1
owner: scheduled_teuthology@teuthology
success: false


Files

ceph-osd (71.6 MB) ceph-osd osd Dan Mick, 10/08/2012 07:00 PM
1349538732.13238.core.bz2 (5.22 MB) 1349538732.13238.core.bz2 Core file from teuthology run Dan Mick, 10/08/2012 07:01 PM
Actions #1

Updated by Sage Weil over 11 years ago

  • Category set to OSD
  • Priority changed from Normal to High
Actions #2

Updated by Samuel Just over 11 years ago

  • Assignee set to Samuel Just
Actions #3

Updated by Tamilarasi muthamizhan over 11 years ago

Recent logs: ubuntu@teuthology:/a/teuthology-2012-10-06_00:00:05-regression-next-testing-basic/2901

0> 2012-10-06 10:16:36.858874 7f76659d8700 -1 ./common/Mutex.h: In function 'void Mutex::Lock(bool)' thread 7f76659d8700 time 2012-10-06 10:16:36.856238
./common/Mutex.h: 113: FAILED assert(r == 0)
ceph version 0.52-838-gaed3612 (commit:aed3612f875a3aeb6463011cb630adc7c936adbd)
1: (Mutex::Lock(bool)+0xa5) [0x5a5665]
2: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x4f) [0x7e4c1f]
3: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0xa7) [0x7e5ad7]
4: (ReplicatedPG::do_osd_op_effects(ReplicatedPG::OpContext*)+0x11f3) [0x56b4a3]
5: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x623) [0x5908a3]
6: (ReplicatedPG::do_op(std::tr1::shared_ptr&lt;OpRequest&gt;)+0x1de5) [0x593ca5]
7: (PG::do_request(std::tr1::shared_ptr&lt;OpRequest&gt;)+0x325) [0x66e185]
8: (OSD::dequeue_op(PG*)+0x2fd) [0x5ce44d]
9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x545) [0x7f1cc5]
10: (ThreadPool::WorkThread::entry()+0x10) [0x7f3c60]
11: (()+0x7e9a) [0x7f7676a5fe9a]
12: (clone()+0x6d) [0x7f7674e034bd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
Actions #4

Updated by Dan Mick over 11 years ago

Caught another instance of something similar; attached is the coredump and ceph-osd binary

Actions #5

Updated by Dan Mick over 11 years ago

Actions #7

Updated by Tamilarasi muthamizhan over 11 years ago

Latest logs: ubuntu@teuthology:/a/teuthology-2012-10-17_19:00:10-regression-master-testing-gcov/3136

2012-10-17 23:21:36.908473 7f1dc5acc700 -1 *** Caught signal (Aborted) **
 in thread 7f1dc5acc700

 ceph version 0.53-315-g1fc18c4 (commit:1fc18c46816caf69365c6ce136e93424e3d4009d)
 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x84cf6a]
 2: (()+0xfcb0) [0x7f1dd6b5bcb0]
 3: (gsignal()+0x35) [0x7f1dd4e3b445]
 4: (abort()+0x17b) [0x7f1dd4e3ebab]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f1dd578969d]
 6: (()+0xb5846) [0x7f1dd5787846]
 7: (()+0xb5873) [0x7f1dd5787873]
 8: (()+0xb596e) [0x7f1dd578796e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x297) [0x935a77]
 10: (ReplicatedPG::do_osd_op_effects(ReplicatedPG::OpContext*)+0x26e4) [0x58ccd4]
 11: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x6d3) [0x5c5743]
 12: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3c0b) [0x5cb8eb]
 13: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x52c) [0x70b72c]
 14: (OSD::dequeue_op(PG*)+0x40f) [0x62683f]
 15: (OSD::OpWQ::_process(PG*)+0x15) [0x68e025]
 16: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x12) [0x6840f2]
 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x73a) [0x92738a]
 18: (ThreadPool::WorkThread::entry()+0x18) [0x92ad98]
 19: (Thread::_entry_func(void*)+0x12) [0x9192b2]
 20: (()+0x7e9a) [0x7f1dd6b53e9a]
 21: (clone()+0x6d) [0x7f1dd4ef74bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #8

Updated by Sage Weil over 11 years ago

  • Status changed from New to Duplicate

dup #3142

Actions #9

Updated by Tamilarasi muthamizhan over 11 years ago

recent logs: ubuntu@teuthology:/a/teuthology-2012-11-23_19:00:03-regression-master-testing-gcov/3036

Actions

Also available in: Atom PDF