Actions
Bug #1520
closedosd crash during rados api tests
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Logs are in teuthology:~teuthworker/archive/nightly_coverage_2011-09-08/25
2011-09-08 01:01:02.376702 7fe3dda52700 osd1 191 pg[82.6( empty n=0 ec=191 les/c 190/191 191/191/191) [1,0] r=0 mlcod 0'0 active+clean] watch: ctx->obc=0x1e27000 cookie=1 oi.version=0 ctx->at_version=191'1 2011-09-08 01:01:02.376730 7fe3dda52700 osd1 191 pg[82.6( empty n=0 ec=191 les/c 190/191 191/191/191) [1,0] r=0 mlcod 0'0 active+clean] watch: oi.user_version=0 2011-09-08 01:01:02.376750 7fe3dda52700 osd1 191 pg[82.6( empty n=0 ec=191 les/c 190/191 191/191/191) [1,0] r=0 mlcod 0'0 active+clean] dump_watchers foo/head foo/head(0'0 unknown0.0:0 wrlock_by=unknown0.0:0) 2011-09-08 01:01:03.029764 7fe3e0459700 osd1 192 OSD::ms_handle_reset() 2011-09-08 01:01:03.029783 7fe3e0459700 osd1 192 OSD::ms_handle_reset() s=0x22e6120 2011-09-08 01:01:03.029792 7fe3e0459700 osd1 192 obc=0x1e27000 2011-09-08 01:01:03.029803 7fe3e0459700 osd1 192 removing watching session entity_name= from foo/head(191'1 unknown0.0:0 wrlock_by=unknown0.0:0) ./common/Mutex.h: In function 'void Mutex::Lock(bool)', in thread '0x7fe3e0459700' ./common/Mutex.h: 110: FAILED assert(r == 0) ceph version 0.34-468-g1a44500 (commit:1a44500f79b156af76688b92ff670b6ac3fb9dbb) 1: /tmp/cephtest/binary/usr/local/bin/cosd() [0x59ba1b] 2: (OSD::ms_dispatch(Message*)+0x2c) [0x59264c] 3: (SimpleMessenger::dispatch_entry()+0x9d2) [0x6144c2] 4: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4a37bc] 5: (Thread::_entry_func(void*)+0x12) [0x6094e2] 6: (()+0x7971) [0x7fe3ebedd971] 7: (clone()+0x6d) [0x7fe3ea76d92d] ceph version 0.34-468-g1a44500 (commit:1a44500f79b156af76688b92ff670b6ac3fb9dbb) 1: /tmp/cephtest/binary/usr/local/bin/cosd() [0x59ba1b] 2: (OSD::ms_dispatch(Message*)+0x2c) [0x59264c] 3: (SimpleMessenger::dispatch_entry()+0x9d2) [0x6144c2] 4: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4a37bc] 5: (Thread::_entry_func(void*)+0x12) [0x6094e2] 6: (()+0x7971) [0x7fe3ebedd971] 7: (clone()+0x6d) [0x7fe3ea76d92d] *** Caught signal (Aborted) ** in thread 0x7fe3e0459700 ceph version 0.34-468-g1a44500 (commit:1a44500f79b156af76688b92ff670b6ac3fb9dbb) 1: /tmp/cephtest/binary/usr/local/bin/cosd() [0x677004] 2: (()+0xfb40) [0x7fe3ebee5b40] 3: (gsignal()+0x35) [0x7fe3ea6baba5] 4: (abort()+0x180) [0x7fe3ea6be6b0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe3eaf5e6bd] 6: (()+0xb9906) [0x7fe3eaf5c906] 7: (()+0xb9933) [0x7fe3eaf5c933] 8: (()+0xb9a3e) [0x7fe3eaf5ca3e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x52f) [0x60a91f] 10: /tmp/cephtest/binary/usr/local/bin/cosd() [0x59ba1b] 11: (OSD::ms_dispatch(Message*)+0x2c) [0x59264c] 12: (SimpleMessenger::dispatch_entry()+0x9d2) [0x6144c2] 13: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4a37bc] 14: (Thread::_entry_func(void*)+0x12) [0x6094e2] 15: (()+0x7971) [0x7fe3ebedd971] 16: (clone()+0x6d) [0x7fe3ea76d92d]
There are also errors in the central log:
2011-09-08 01:00:55.503603 7f018f79c700 log [ERR] : got 79.0 pg_stat from osd0 but dne in pg_map 2011-09-08 01:00:55.503641 7f018f79c700 log [ERR] : got 79.1 pg_stat from osd0 but dne in pg_map 2011-09-08 01:00:55.503658 7f018f79c700 log [ERR] : got 79.2 pg_stat from osd0 but dne in pg_map 2011-09-08 01:00:55.503673 7f018f79c700 log [ERR] : got 79.3 pg_stat from osd0 but dne in pg_map 2011-09-08 01:00:55.503688 7f018f79c700 log [ERR] : got 79.4 pg_stat from osd0 but dne in pg_map 2011-09-08 01:00:55.503705 7f018f79c700 log [ERR] : got 79.5 pg_stat from osd0 but dne in pg_map 2011-09-08 01:00:55.503720 7f018f79c700 log [ERR] : got 79.6 pg_stat from osd0 but dne in pg_map 2011-09-08 01:00:55.503735 7f018f79c700 log [ERR] : got 79.7 pg_stat from osd0 but dne in pg_map
Updated by Josh Durgin over 12 years ago
This happened again with today's run (teuthology:~teuthworker/archive/nightly_coverage_2011-09-09/110/)
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position set to 29
Updated by Sage Weil over 12 years ago
This assertion usually indicates an use(unlock)-after-free. Probably just need to reproduce this with full logs to tell what's up.
Updated by Sage Weil over 12 years ago
- Status changed from New to Resolved
- Assignee set to Sage Weil
Actions