Project

General

Profile

Actions

Bug #2211

closed

osd: entity_inst_t OSDMap::get_inst(int) const

Added by Wido den Hollander about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Spent time:
Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While trying out the new heartbeat code I encountered this crash:

Core was generated by `/usr/bin/ceph-osd -i 8 -c /etc/ceph/ceph.conf'.
Program terminated with signal 6, Aborted.
#0  0x00007f28b4ff7f2b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f28b4ff7f2b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000006fb75d in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:95
#3  <signal handler called>
#4  0x00007f28b35753a5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f28b3578b0b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f28b3e33d7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f28b3e31f26 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f28b3e31f53 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f28b3e3204e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x0000000000690540 in ceph::__ceph_assert_fail (assertion=0x804237 "is_up(osd)", file=0x811a8a "./osd/OSDMap.h", line=360, func=0x8156e0 "entity_inst_t OSDMap::get_inst(int) const") at common/assert.cc:75
#11 0x00000000005b210f in get_inst (this=<optimized out>, osd=<optimized out>) at ./osd/OSDMap.h:360
#12 get_inst (osd=0, this=<optimized out>) at osd/OSD.cc:2128
#13 OSD::send_failures (this=0x27eb000) at osd/OSD.cc:2137
#14 0x00000000005b5abb in OSD::do_mon_report (this=0x27eb000) at osd/OSD.cc:1867
#15 0x00000000005cb5c2 in OSD::tick (this=0x27eb000) at osd/OSD.cc:1736
#16 0x0000000000686ad9 in SafeTimer::timer_thread (this=0x27eb050) at common/Timer.cc:102
#17 0x00000000006876ad in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#18 0x00007f28b4fefefc in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#19 0x00007f28b362089d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#20 0x0000000000000000 in ?? ()

The last log lines I got where:

2012-03-26 14:23:30.643006 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.4 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6814/1652 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x1a0b9c40 con 0x7e47dc0
2012-03-26 14:23:30.643237 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.7 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6813/3697 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x1a0b9540 con 0xa776140
2012-03-26 14:23:30.643389 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.12 [2a00:f10:11b:cef0:225:90ff:fe33:49b0]:6801/4725 1 ==== osd_ping(ping_reply e7983 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (2111163181 0 0) 0x1a0b9380 con 0xa776280
2012-03-26 14:23:30.643449 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.18 [2a00:f10:11b:cef0:225:90ff:fe33:49cc]:6803/14961 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x1a0b9700 con 0xa776c80
2012-03-26 14:23:30.643530 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.22 [2a00:f10:11b:cef0:225:90ff:fe33:497c]:6804/22189 1 ==== osd_ping(ping_reply e7977 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1810562802 0 0) 0x1a0b9e00 con 0xa776780
2012-03-26 14:23:30.643676 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.3 [2a00:f10:11b:cef0:225:90ff:fe33:49fe]:6807/5705 1 ==== osd_ping(ping_reply e7981 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (2358970935 0 0) 0x1a0b9a80 con 0x1a188780
2012-03-26 14:23:30.643887 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.6 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6821/1802 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x1a0b9000 con 0x2aac280
2012-03-26 14:23:30.643975 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.11 [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:6812/23622 1 ==== osd_ping(ping_reply e7968 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (3628204292 0 0) 0x1a0b98c0 con 0x14d63a00
2012-03-26 14:23:30.644182 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.33 [2a00:f10:11b:cef0:225:90ff:fe33:498e]:6802/6908 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x126b1380 con 0x102be500
2012-03-26 14:23:30.644397 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.32 [2a00:f10:11b:cef0:225:90ff:fe33:498e]:6813/6840 1 ==== osd_ping(ping_reply e7987 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (3260606612 0 0) 0x126b1000 con 0x1a188dc0
2012-03-26 14:23:30.644590 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.34 [2a00:f10:11b:cef0:225:90ff:fe33:498e]:6807/1953 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x126b11c0 con 0xa776b40
2012-03-26 14:23:30.644806 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.16 [2a00:f10:11b:cef0:225:90ff:fe33:49cc]:6808/14817 1 ==== osd_ping(ping_reply e0 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1642745581 0 0) 0x16cc1a80 con 0xa776a00
2012-03-26 14:23:30.645407 7f28a730d700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:0/24071 <== osd.21 [2a00:f10:11b:cef0:225:90ff:fe33:497c]:6819/22110 1 ==== osd_ping(ping_reply e7971 stamp 2012-03-26 14:23:30.640723) v2 ==== 47+0+0 (1370202259 0 0) 0x16cc1e00 con 0xa776000
2012-03-26 14:23:30.823044 7f28a830f700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:6808/24071 <== mon.2 [2a00:f10:11b:cef0:230:48ff:fed3:b086]:6789/0 10 ==== mon_check_map_ack(handle=61 version=7987) v2 ==== 24+0+0 (719601101 0 0) 0x16cc1540 con 0x1a188a00
2012-03-26 14:23:30.823189 7f28a5b0a700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49f2]:6808/24071 --> [2a00:f10:11b:cef0:230:48ff:fed3:b086]:6789/0 -- osd_boot(osd.8 v7984) v2 -- ?+0 0x7a46000 con 0x1a188a00
./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int) const' thread 7f28afb1e700 time 2012-03-26 14:23:31.273426
./osd/OSDMap.h: 360: FAILED assert(is_up(osd))
 ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf)
 1: (OSD::send_failures()+0x11f) [0x5b210f]
 2: (OSD::do_mon_report()+0x5b) [0x5b5abb]
 3: (OSD::tick()+0x842) [0x5cb5c2]
 4: (SafeTimer::timer_thread()+0x339) [0x686ad9]
 5: (SafeTimerThread::entry()+0xd) [0x6876ad]
 6: (()+0x7efc) [0x7f28b4fefefc]
 7: (clone()+0x6d) [0x7f28b362089d]
 ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf)
 1: (OSD::send_failures()+0x11f) [0x5b210f]
 2: (OSD::do_mon_report()+0x5b) [0x5b5abb]
 3: (OSD::tick()+0x842) [0x5cb5c2]
 4: (SafeTimer::timer_thread()+0x339) [0x686ad9]
 5: (SafeTimerThread::entry()+0xd) [0x6876ad]
 6: (()+0x7efc) [0x7f28b4fefefc]
 7: (clone()+0x6d) [0x7f28b362089d]
*** Caught signal (Aborted) **
 in thread 7f28afb1e700
 ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf)
 1: /usr/bin/ceph-osd() [0x6fb6e6]
 2: (()+0x10060) [0x7f28b4ff8060]
 3: (gsignal()+0x35) [0x7f28b35753a5]
 4: (abort()+0x17b) [0x7f28b3578b0b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f28b3e33d7d]
 6: (()+0xb9f26) [0x7f28b3e31f26]
 7: (()+0xb9f53) [0x7f28b3e31f53]
 8: (()+0xba04e) [0x7f28b3e3204e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x200) [0x690540]
 10: (OSD::send_failures()+0x11f) [0x5b210f]
 11: (OSD::do_mon_report()+0x5b) [0x5b5abb]
 12: (OSD::tick()+0x842) [0x5cb5c2]
 13: (SafeTimer::timer_thread()+0x339) [0x686ad9]
 14: (SafeTimerThread::entry()+0xd) [0x6876ad]
 15: (()+0x7efc) [0x7f28b4fefefc]
 16: (clone()+0x6d) [0x7f28b362089d]

This happened in a state where my OSD's were still bouncing a bit since I restarted them fairly short after each other.


Files

ceph_w.log (434 KB) ceph_w.log Wido den Hollander, 03/29/2012 01:40 AM
Actions

Also available in: Atom PDF