Project

General

Profile

Bug #2212

osd: FAILED assert(msgr->lock.is_locked())

Added by Wido den Hollander about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

With the new heartbeat code I noticed a couple of OSD's go down with:

Core was generated by `/usr/bin/ceph-osd -i 5 -c /etc/ceph/ceph.conf'.
Program terminated with signal 6, Aborted.
#0  0x00007f891158bf2b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f891158bf2b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000006fb75d in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:95
#3  <signal handler called>
#4  0x00007f890fb093a5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f890fb0cb0b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f89103c7d7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f89103c5f26 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f89103c5f53 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f89103c604e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x0000000000690540 in ceph::__ceph_assert_fail (assertion=0x81ce39 "msgr->lock.is_locked()", file=0x81c912 "msg/SimpleMessenger.cc", line=1367, func=0x81ee60 "void SimpleMessenger::Pipe::unregister_pipe()")
    at common/assert.cc:75
#11 0x00000000006719b4 in SimpleMessenger::Pipe::unregister_pipe (this=0x15969780) at msg/SimpleMessenger.cc:1367
#12 0x000000000067ffe1 in SimpleMessenger::submit_message (this=0xf7fb00, m=0x15fef540, pipe=0x15969780) at msg/SimpleMessenger.cc:2481
#13 0x0000000000680366 in SimpleMessenger::send_message (this=0xf7fb00, m=0x15fef540, con=0x1582aa00) at msg/SimpleMessenger.cc:462
#14 0x00000000005ab8ed in OSD::heartbeat (this=0x100b000) at osd/OSD.cc:1669
#15 0x00000000005ac715 in OSD::heartbeat_entry (this=0x100b000) at osd/OSD.cc:1589
#16 0x00000000005e29fd in OSD::T_Heartbeat::entry (this=<optimized out>) at osd/OSD.h:286
#17 0x00007f8911583efc in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007f890fbb489d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#19 0x0000000000000000 in ?? ()
(gdb)

The last log lines before it went down:

2012-03-26 14:07:44.685855 7f89040a2700 -- [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6822/1720 --> osd.38 [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:6822/4886 -- osd_map(7049..7049 src has 434..7049) v3 -- ?+0 0xd078400
2012-03-26 14:07:44.685979 7f89040a2700 -- [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6822/1720 --> osd.38 [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:6822/4886 -- pg_notify(0.704,1.703,2.702,2.b7,0.562,2.560,1.561,0.52e,2.52c,1.52d,0.874,1.873,2.872 epoch 7049 query_epoch 7049) v2 -- ?+0 0x157dbc40
2012-03-26 14:07:44.686071 7f89040a2700 -- [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6822/1720 --> osd.39 [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:6808/28915 -- pg_notify(0.1d5,1.1d4,2.1d3,0.778,1.777,2.776,0.119,1.118,2.117,0.6d8,1.6d7,2.6d6,0.5cf,1.5ce,2.5cd,1.596,0.597,2.595,1.908,0.909,2.907,0.8be,2.8bc,1.8bd epoch 7049 query_epoch 7049) v2 -- ?+0 0x6ec0c40
2012-03-26 14:07:44.725113 7f89030a0700 -- [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6823/1720 <== osd.4 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:0/1652 6 ==== osd_ping(ping e0 stamp 2012-03-26 14:07:44.723618) v2 ==== 47+0+0 (1046436517 0 0) 0x202a6000 con 0x20771640
msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::unregister_pipe()' thread 7f88fef97700 time 2012-03-26 14:07:44.729334
msg/SimpleMessenger.cc: 1367: FAILED assert(msgr->lock.is_locked())
 ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf)
 1: (SimpleMessenger::Pipe::unregister_pipe()+0x294) [0x6719b4]
 2: (SimpleMessenger::submit_message(Message*, SimpleMessenger::Pipe*)+0x3d1) [0x67ffe1]
 3: (SimpleMessenger::send_message(Message*, Connection*)+0x1c6) [0x680366]
 4: (OSD::heartbeat()+0x16d) [0x5ab8ed]
 5: (OSD::heartbeat_entry()+0x45) [0x5ac715]
 6: (OSD::T_Heartbeat::entry()+0xd) [0x5e29fd]
 7: (()+0x7efc) [0x7f8911583efc]
 8: (clone()+0x6d) [0x7f890fbb489d]
 ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf)
 1: (SimpleMessenger::Pipe::unregister_pipe()+0x294) [0x6719b4]
 2: (SimpleMessenger::submit_message(Message*, SimpleMessenger::Pipe*)+0x3d1) [0x67ffe1]
 3: (SimpleMessenger::send_message(Message*, Connection*)+0x1c6) [0x680366]
 4: (OSD::heartbeat()+0x16d) [0x5ab8ed]
 5: (OSD::heartbeat_entry()+0x45) [0x5ac715]
 6: (OSD::T_Heartbeat::entry()+0xd) [0x5e29fd]
 7: (()+0x7efc) [0x7f8911583efc]
 8: (clone()+0x6d) [0x7f890fbb489d]
*** Caught signal (Aborted) **
 in thread 7f88fef97700
 ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf)
 1: /usr/bin/ceph-osd() [0x6fb6e6]
 2: (()+0x10060) [0x7f891158c060]
 3: (gsignal()+0x35) [0x7f890fb093a5]
 4: (abort()+0x17b) [0x7f890fb0cb0b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f89103c7d7d]
 6: (()+0xb9f26) [0x7f89103c5f26]
 7: (()+0xb9f53) [0x7f89103c5f53]
 8: (()+0xba04e) [0x7f89103c604e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x200) [0x690540]
 10: (SimpleMessenger::Pipe::unregister_pipe()+0x294) [0x6719b4]
 11: (SimpleMessenger::submit_message(Message*, SimpleMessenger::Pipe*)+0x3d1) [0x67ffe1]
 12: (SimpleMessenger::send_message(Message*, Connection*)+0x1c6) [0x680366]
 13: (OSD::heartbeat()+0x16d) [0x5ab8ed]
 14: (OSD::heartbeat_entry()+0x45) [0x5ac715]
 15: (OSD::T_Heartbeat::entry()+0xd) [0x5e29fd]
 16: (()+0x7efc) [0x7f8911583efc]
 17: (clone()+0x6d) [0x7f890fbb489d]

Debugging wasn't high enough though, I'll try to increase it a bit.

Associated revisions

Revision fe5f0331 (diff)
Added by Sage Weil almost 12 years ago

osd: send pings from hbin

Fixes: #2212
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil about 12 years ago

  • Status changed from New to Resolved

ah, i was using wrong msgr, fixing!

Also available in: Atom PDF