Project

General

Profile

Actions

Bug #1520

closed

osd crash during rados api tests

Added by Josh Durgin over 12 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in teuthology:~teuthworker/archive/nightly_coverage_2011-09-08/25

2011-09-08 01:01:02.376702 7fe3dda52700 osd1 191 pg[82.6( empty n=0 ec=191 les/c 190/191 191/191/191) [1,0] r=0 mlcod 0'0 active+clean] watch: ctx->obc=0x1e27000 cookie=1 oi.version=0 ctx->at_version=191'1
2011-09-08 01:01:02.376730 7fe3dda52700 osd1 191 pg[82.6( empty n=0 ec=191 les/c 190/191 191/191/191) [1,0] r=0 mlcod 0'0 active+clean] watch: oi.user_version=0
2011-09-08 01:01:02.376750 7fe3dda52700 osd1 191 pg[82.6( empty n=0 ec=191 les/c 190/191 191/191/191) [1,0] r=0 mlcod 0'0 active+clean] dump_watchers foo/head foo/head(0'0 unknown0.0:0 wrlock_by=unknown0.0:0)
2011-09-08 01:01:03.029764 7fe3e0459700 osd1 192 OSD::ms_handle_reset()
2011-09-08 01:01:03.029783 7fe3e0459700 osd1 192 OSD::ms_handle_reset() s=0x22e6120
2011-09-08 01:01:03.029792 7fe3e0459700 osd1 192 obc=0x1e27000
2011-09-08 01:01:03.029803 7fe3e0459700 osd1 192 removing watching session entity_name= from foo/head(191'1 unknown0.0:0 wrlock_by=unknown0.0:0)
./common/Mutex.h: In function 'void Mutex::Lock(bool)', in thread '0x7fe3e0459700'
./common/Mutex.h: 110: FAILED assert(r == 0)
 ceph version 0.34-468-g1a44500 (commit:1a44500f79b156af76688b92ff670b6ac3fb9dbb)
 1: /tmp/cephtest/binary/usr/local/bin/cosd() [0x59ba1b]
 2: (OSD::ms_dispatch(Message*)+0x2c) [0x59264c]
 3: (SimpleMessenger::dispatch_entry()+0x9d2) [0x6144c2]
 4: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4a37bc]
 5: (Thread::_entry_func(void*)+0x12) [0x6094e2]
 6: (()+0x7971) [0x7fe3ebedd971]
 7: (clone()+0x6d) [0x7fe3ea76d92d]
 ceph version 0.34-468-g1a44500 (commit:1a44500f79b156af76688b92ff670b6ac3fb9dbb)
 1: /tmp/cephtest/binary/usr/local/bin/cosd() [0x59ba1b]
 2: (OSD::ms_dispatch(Message*)+0x2c) [0x59264c]
 3: (SimpleMessenger::dispatch_entry()+0x9d2) [0x6144c2]
 4: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4a37bc]
 5: (Thread::_entry_func(void*)+0x12) [0x6094e2]
 6: (()+0x7971) [0x7fe3ebedd971]
 7: (clone()+0x6d) [0x7fe3ea76d92d]
*** Caught signal (Aborted) **
 in thread 0x7fe3e0459700
 ceph version 0.34-468-g1a44500 (commit:1a44500f79b156af76688b92ff670b6ac3fb9dbb)
 1: /tmp/cephtest/binary/usr/local/bin/cosd() [0x677004]
 2: (()+0xfb40) [0x7fe3ebee5b40]
 3: (gsignal()+0x35) [0x7fe3ea6baba5]
 4: (abort()+0x180) [0x7fe3ea6be6b0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe3eaf5e6bd]
 6: (()+0xb9906) [0x7fe3eaf5c906]
 7: (()+0xb9933) [0x7fe3eaf5c933]
 8: (()+0xb9a3e) [0x7fe3eaf5ca3e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x52f) [0x60a91f]
 10: /tmp/cephtest/binary/usr/local/bin/cosd() [0x59ba1b]
 11: (OSD::ms_dispatch(Message*)+0x2c) [0x59264c]
 12: (SimpleMessenger::dispatch_entry()+0x9d2) [0x6144c2]
 13: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4a37bc]
 14: (Thread::_entry_func(void*)+0x12) [0x6094e2]
 15: (()+0x7971) [0x7fe3ebedd971]
 16: (clone()+0x6d) [0x7fe3ea76d92d]

There are also errors in the central log:

2011-09-08 01:00:55.503603 7f018f79c700 log [ERR] : got 79.0 pg_stat from osd0 but dne in pg_map
2011-09-08 01:00:55.503641 7f018f79c700 log [ERR] : got 79.1 pg_stat from osd0 but dne in pg_map
2011-09-08 01:00:55.503658 7f018f79c700 log [ERR] : got 79.2 pg_stat from osd0 but dne in pg_map
2011-09-08 01:00:55.503673 7f018f79c700 log [ERR] : got 79.3 pg_stat from osd0 but dne in pg_map
2011-09-08 01:00:55.503688 7f018f79c700 log [ERR] : got 79.4 pg_stat from osd0 but dne in pg_map
2011-09-08 01:00:55.503705 7f018f79c700 log [ERR] : got 79.5 pg_stat from osd0 but dne in pg_map
2011-09-08 01:00:55.503720 7f018f79c700 log [ERR] : got 79.6 pg_stat from osd0 but dne in pg_map
2011-09-08 01:00:55.503735 7f018f79c700 log [ERR] : got 79.7 pg_stat from osd0 but dne in pg_map

Actions #1

Updated by Josh Durgin over 12 years ago

This happened again with today's run (teuthology:~teuthworker/archive/nightly_coverage_2011-09-09/110/)

Actions #2

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position set to 29
Actions #3

Updated by Sage Weil over 12 years ago

This assertion usually indicates an use(unlock)-after-free. Probably just need to reproduce this with full logs to tell what's up.

Actions #4

Updated by Sage Weil over 12 years ago

  • Status changed from New to Resolved
  • Assignee set to Sage Weil
Actions

Also available in: Atom PDF