Actions
Bug #2211
closedosd: entity_inst_t OSDMap::get_inst(int) const
% Done:
0%
Spent time:
Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
While trying out the new heartbeat code I encountered this crash:
Core was generated by `/usr/bin/ceph-osd -i 8 -c /etc/ceph/ceph.conf'. Program terminated with signal 6, Aborted. #0 0x00007f28b4ff7f2b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x00007f28b4ff7f2b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00000000006fb75d in reraise_fatal (signum=6) at global/signal_handler.cc:59 #2 handle_fatal_signal (signum=6) at global/signal_handler.cc:95 #3 <signal handler called> #4 0x00007f28b35753a5 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007f28b3578b0b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f28b3e33d7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00007f28b3e31f26 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #8 0x00007f28b3e31f53 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #9 0x00007f28b3e3204e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #10 0x0000000000690540 in ceph::__ceph_assert_fail (assertion=0x804237 "is_up(osd)", file=0x811a8a "./osd/OSDMap.h", line=360, func=0x8156e0 "entity_inst_t OSDMap::get_inst(int) const") at common/assert.cc:75 #11 0x00000000005b210f in get_inst (this=<optimized out>, osd=<optimized out>) at ./osd/OSDMap.h:360 #12 get_inst (osd=0, this=<optimized out>) at osd/OSD.cc:2128 #13 OSD::send_failures (this=0x27eb000) at osd/OSD.cc:2137 #14 0x00000000005b5abb in OSD::do_mon_report (this=0x27eb000) at osd/OSD.cc:1867 #15 0x00000000005cb5c2 in OSD::tick (this=0x27eb000) at osd/OSD.cc:1736 #16 0x0000000000686ad9 in SafeTimer::timer_thread (this=0x27eb050) at common/Timer.cc:102 #17 0x00000000006876ad in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38 #18 0x00007f28b4fefefc in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #19 0x00007f28b362089d in clone () from /lib/x86_64-linux-gnu/libc.so.6 #20 0x0000000000000000 in ?? ()
The last log lines I got where:
2012-03-26 14:23:30.643006 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.4 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6814/1652 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x1a0b9c40 con 0x7e47dc0 2012-03-26 14:23:30.643237 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.7 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6813/3697 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x1a0b9540 con 0xa776140 2012-03-26 14:23:30.643389 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.12 [2a00:f10:11b:cef0:225:90ff:fe33:49b0]:6801/4725 1 ==== osd_ping(ping_reply e7983 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (2111163181 0 0) 0x1a0b9380 con 0xa776280 2012-03-26 14:23:30.643449 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.18 [2a00:f10:11b:cef0:225:90ff:fe33:49cc]:6803/14961 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x1a0b9700 con 0xa776c80 2012-03-26 14:23:30.643530 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.22 [2a00:f10:11b:cef0:225:90ff:fe33:497c]:6804/22189 1 ==== osd_ping(ping_reply e7977 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1810562802 0 0) 0x1a0b9e00 con 0xa776780 2012-03-26 14:23:30.643676 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.3 [2a00:f10:11b:cef0:225:90ff:fe33:49fe]:6807/5705 1 ==== osd_ping(ping_reply e7981 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (2358970935 0 0) 0x1a0b9a80 con 0x1a188780 2012-03-26 14:23:30.643887 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.6 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6821/1802 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x1a0b9000 con 0x2aac280 2012-03-26 14:23:30.643975 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.11 [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:6812/23622 1 ==== osd_ping(ping_reply e7968 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (3628204292 0 0) 0x1a0b98c0 con 0x14d63a00 2012-03-26 14:23:30.644182 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.33 [2a00:f10:11b:cef0:225:90ff:fe33:498e]:6802/6908 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x126b1380 con 0x102be500 2012-03-26 14:23:30.644397 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.32 [2a00:f10:11b:cef0:225:90ff:fe33:498e]:6813/6840 1 ==== osd_ping(ping_reply e7987 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (3260606612 0 0) 0x126b1000 con 0x1a188dc0 2012-03-26 14:23:30.644590 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.34 [2a00:f10:11b:cef0:225:90ff:fe33:498e]:6807/1953 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x126b11c0 con 0xa776b40 2012-03-26 14:23:30.644806 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.16 [2a00:f10:11b:cef0:225:90ff:fe33:49cc]:6808/14817 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x16cc1a80 con 0xa776a00 2012-03-26 14:23:30.645407 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.21 [2a00:f10:11b:cef0:225:90ff:fe33:497c]:6819/22110 1 ==== osd_ping(ping_reply e7971 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1370202259 0 0) 0x16cc1e00 con 0xa776000 2012-03-26 14:23:30.823044 7f28a830f700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:6808/24071 <== mon.2 [2a00:f10:11b:cef0:230:48ff:fed3:b086]:6789/0 10 ==== mon_check_map_ack(handle=61 version=7987) v2 ==== 24+0+0 (719601101 0 0) 0x16cc1540 con 0x1a188a00 2012-03-26 14:23:30.823189 7f28a5b0a700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:6808/24071 --> [2a00:f10:11b:cef0:230:48ff:fed3:b086]:6789/0 -- osd_boot(osd.8 v7984) v2 -- ?+0 0x7a46000 con 0x1a188a00 ./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int) const' thread 7f28afb1e700 time 2012-03-26 14:23:31.273426 ./osd/OSDMap.h: 360: FAILED assert(is_up(osd)) ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf) 1: (OSD::send_failures()+0x11f) [0x5b210f] 2: (OSD::do_mon_report()+0x5b) [0x5b5abb] 3: (OSD::tick()+0x842) [0x5cb5c2] 4: (SafeTimer::timer_thread()+0x339) [0x686ad9] 5: (SafeTimerThread::entry()+0xd) [0x6876ad] 6: (()+0x7efc) [0x7f28b4fefefc] 7: (clone()+0x6d) [0x7f28b362089d] ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf) 1: (OSD::send_failures()+0x11f) [0x5b210f] 2: (OSD::do_mon_report()+0x5b) [0x5b5abb] 3: (OSD::tick()+0x842) [0x5cb5c2] 4: (SafeTimer::timer_thread()+0x339) [0x686ad9] 5: (SafeTimerThread::entry()+0xd) [0x6876ad] 6: (()+0x7efc) [0x7f28b4fefefc] 7: (clone()+0x6d) [0x7f28b362089d] *** Caught signal (Aborted) ** in thread 7f28afb1e700 ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf) 1: /usr/bin/ceph-osd() [0x6fb6e6] 2: (()+0x10060) [0x7f28b4ff8060] 3: (gsignal()+0x35) [0x7f28b35753a5] 4: (abort()+0x17b) [0x7f28b3578b0b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f28b3e33d7d] 6: (()+0xb9f26) [0x7f28b3e31f26] 7: (()+0xb9f53) [0x7f28b3e31f53] 8: (()+0xba04e) [0x7f28b3e3204e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x200) [0x690540] 10: (OSD::send_failures()+0x11f) [0x5b210f] 11: (OSD::do_mon_report()+0x5b) [0x5b5abb] 12: (OSD::tick()+0x842) [0x5cb5c2] 13: (SafeTimer::timer_thread()+0x339) [0x686ad9] 14: (SafeTimerThread::entry()+0xd) [0x6876ad] 15: (()+0x7efc) [0x7f28b4fefefc] 16: (clone()+0x6d) [0x7f28b362089d]
This happened in a state where my OSD's were still bouncing a bit since I restarted them fairly short after each other.
Files
Actions