Bug #3270
osd crash during rbd test run
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Logs: ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1570
2012-10-03 21:28:01.688931 7fe7ad7c1700 -1 *** Caught signal (Aborted) ** in thread 7fe7ad7c1700 ceph version 0.52-958-gdb7c419 (commit:db7c41934b6e894c7d5a01ddf1a3592744c3d73c) 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x8498ea] 2: (()+0xfcb0) [0x7fe7be850cb0] 3: (gsignal()+0x35) [0x7fe7bcb30445] 4: (abort()+0x17b) [0x7fe7bcb33bab] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe7bd47e69d] 6: (()+0xb5846) [0x7fe7bd47c846] 7: (()+0xb5873) [0x7fe7bd47c873] 8: (()+0xb596e) [0x7fe7bd47c96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x297) [0x9322c7] 10: (ReplicatedPG::do_osd_op_effects(ReplicatedPG::OpContext*)+0x26e4) [0x58bdc4] 11: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x6d3) [0x5c4893] 12: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3194) [0x5c9fc4] 13: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x52c) [0x707c8c] 14: (OSD::dequeue_op(PG*)+0x40f) [0x625abf] 15: (OSD::OpWQ::_process(PG*)+0x15) [0x68d285] 16: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x12) [0x683352] 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x73a) [0x923bda] 18: (ThreadPool::WorkThread::entry()+0x18) [0x9275e8] 19: (Thread::_entry_func(void*)+0x12) [0x915b02] 20: (()+0x7e9a) [0x7fe7be848e9a] 21: (clone()+0x6d) [0x7fe7bcbec4bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1570$ cat config.yaml kernel: &id001 kdb: true sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1 nuke-on-error: true overrides: ceph: conf: client: rbd cache: true global: ms inject socket failures: 5000 coverage: true fs: btrfs log-whitelist: - slow request sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c s3tests: branch: master workunit: sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana31.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5J4n7rTsH+IMjGAu+EfhukuK5+zScoSaPIfXDOUU8LfvuI/3x8Luiyv9eRVwZgwuLBWZ/zorBbGZ+G2Iaxy3632AG/XE7cRZA9AxzZT+Qvm9D+BW+Uletgf92cttKMk7qwK3DetQwRKKl6AMv0SDpUff+nzqnJH6LMS8zoBPVXDHFM3Lup8h9H6DYEs1F/Zn8LVSw8hNiD279rg1n1hqWdItmnKBPKyC/qkRoPa6h7gDU6FPaBiNhuhBd0016XGrVwL7Y8gqoDBiArP+NDt1lcnbeiK43bFhqW+pYovOdIA2MJC6z+bkZDlOJdxoz9mDP0cJZBdB43v3UdbS1R+WT ubuntu@plana39.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDo+Kh24vRxeTQ6/n5PIIGuxrPHPRO/xMQlwoLHi7mR01cIXJMG5wet7mp2om3/5SZSDcLBHduDKrdWL142Sg5fC0zZPUggbxS7nz/UCjYBzMsOtHEUAU5Gs0KFopOCHXNEveK95ezsroMAD5+jS/IEpiooYCkrR3H+NSvUU0Ae352PlXqV0vamkYzyQyEMmhFE50ALhUXbKMve3d2mxJee5sqVZSBmQTbze9RKUA96t9iiwiheflXbN1i9WHlbBOIue5pZ5fM3/vqPWgaShfFpa0pT56QKJfjyFcDeCLOislo23E5qKAJOi5vn5BoYVtG3niNQpt/YbYGfDEHVeqt9 ubuntu@plana46.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCrD6s9otJ5xCNH4nyv0iJu6AoqmlNTFd8D0X9RfFBnmOMrMBWU9kwsFzPIOsuJGbSYbA8LCtjWUwaWoXmbEFtTMitxaDXp47gbVNXknHq7TGZHkWWOwKKu+tlSQBpCVzO/rzBbvJ9fcG7tewq5XcIHz0IUXsUFuEuXR1HaTUJKic2twBpaeAGNvdd6IZ9Sz9TMkfiRV/aVdcHJ/yF8bsXi3pfRPR3puMK/Nyfq5Hz/aabQo1TSyK2o0weoWV7D8vD6S8f3D7p5/5ScBhL3zUcP85SsV47W+/hTFbU8kN1Grlv2sx0fVMB/TUB/UNVdsHKGn5Nv6zb/qMqBEx9nSeZ9 tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rbd_fsx: clients: - client.0 ops: 2000 ubuntu@teuthology:/a/teuthology-2012-10-03_19:00:11-regression-master-testing-gcov/1570$ cat summary.yaml ceph-sha1: db7c41934b6e894c7d5a01ddf1a3592744c3d73c client.0-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1 description: collection:rbd-thrash clusters:6-osd-3-machine.yaml fs:btrfs.yaml msgr-failures:few.yaml thrashers:default.yaml workloads:rbd_fsx_cache_writeback.yaml duration: 1793.8868980407715 failure_reason: 'Command failed with status 1: ''/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper term /tmp/cephtest/binary/usr/local/bin/ceph-osd -f -i 4 -c /tmp/cephtest/ceph.conf''' flavor: gcov mds.a-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1 mon.a-kernel-sha1: 8f4721bbf46295e61e0d7da9c1c739a62fae55a1 owner: scheduled_teuthology@teuthology success: false
History
#1 Updated by Sage Weil almost 11 years ago
- Category set to OSD
- Priority changed from Normal to High
#2 Updated by Samuel Just almost 11 years ago
- Assignee set to Samuel Just
#3 Updated by Tamilarasi muthamizhan almost 11 years ago
Recent logs: ubuntu@teuthology:/a/teuthology-2012-10-06_00:00:05-regression-next-testing-basic/2901
0> 2012-10-06 10:16:36.858874 7f76659d8700 -1 ./common/Mutex.h: In function 'void Mutex::Lock(bool)' thread 7f76659d8700 time 2012-10-06 10:16:36.856238
./common/Mutex.h: 113: FAILED assert(r == 0)
ceph version 0.52-838-gaed3612 (commit:aed3612f875a3aeb6463011cb630adc7c936adbd)
1: (Mutex::Lock(bool)+0xa5) [0x5a5665]
2: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x4f) [0x7e4c1f]
3: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0xa7) [0x7e5ad7]
4: (ReplicatedPG::do_osd_op_effects(ReplicatedPG::OpContext*)+0x11f3) [0x56b4a3]
5: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x623) [0x5908a3]
6: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x1de5) [0x593ca5]
7: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x325) [0x66e185]
8: (OSD::dequeue_op(PG*)+0x2fd) [0x5ce44d]
9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x545) [0x7f1cc5]
10: (ThreadPool::WorkThread::entry()+0x10) [0x7f3c60]
11: (()+0x7e9a) [0x7f7676a5fe9a]
12: (clone()+0x6d) [0x7f7674e034bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
#4 Updated by Dan Mick almost 11 years ago
Caught another instance of something similar; attached is the coredump and ceph-osd binary
#5 Updated by Dan Mick almost 11 years ago
- File ceph-osd added
#6 Updated by Dan Mick almost 11 years ago
- File 1349538732.13238.core.bz2 added
#7 Updated by Tamilarasi muthamizhan almost 11 years ago
Latest logs: ubuntu@teuthology:/a/teuthology-2012-10-17_19:00:10-regression-master-testing-gcov/3136
2012-10-17 23:21:36.908473 7f1dc5acc700 -1 *** Caught signal (Aborted) ** in thread 7f1dc5acc700 ceph version 0.53-315-g1fc18c4 (commit:1fc18c46816caf69365c6ce136e93424e3d4009d) 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x84cf6a] 2: (()+0xfcb0) [0x7f1dd6b5bcb0] 3: (gsignal()+0x35) [0x7f1dd4e3b445] 4: (abort()+0x17b) [0x7f1dd4e3ebab] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f1dd578969d] 6: (()+0xb5846) [0x7f1dd5787846] 7: (()+0xb5873) [0x7f1dd5787873] 8: (()+0xb596e) [0x7f1dd578796e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x297) [0x935a77] 10: (ReplicatedPG::do_osd_op_effects(ReplicatedPG::OpContext*)+0x26e4) [0x58ccd4] 11: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x6d3) [0x5c5743] 12: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3c0b) [0x5cb8eb] 13: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x52c) [0x70b72c] 14: (OSD::dequeue_op(PG*)+0x40f) [0x62683f] 15: (OSD::OpWQ::_process(PG*)+0x15) [0x68e025] 16: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x12) [0x6840f2] 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x73a) [0x92738a] 18: (ThreadPool::WorkThread::entry()+0x18) [0x92ad98] 19: (Thread::_entry_func(void*)+0x12) [0x9192b2] 20: (()+0x7e9a) [0x7f1dd6b53e9a] 21: (clone()+0x6d) [0x7f1dd4ef74bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
#9 Updated by Tamilarasi muthamizhan almost 11 years ago
recent logs: ubuntu@teuthology:/a/teuthology-2012-11-23_19:00:03-regression-master-testing-gcov/3036