Bug #2212
osd: FAILED assert(msgr->lock.is_locked())
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
With the new heartbeat code I noticed a couple of OSD's go down with:
Core was generated by `/usr/bin/ceph-osd -i 5 -c /etc/ceph/ceph.conf'. Program terminated with signal 6, Aborted. #0 0x00007f891158bf2b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x00007f891158bf2b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00000000006fb75d in reraise_fatal (signum=6) at global/signal_handler.cc:59 #2 handle_fatal_signal (signum=6) at global/signal_handler.cc:95 #3 <signal handler called> #4 0x00007f890fb093a5 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007f890fb0cb0b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f89103c7d7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00007f89103c5f26 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #8 0x00007f89103c5f53 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #9 0x00007f89103c604e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #10 0x0000000000690540 in ceph::__ceph_assert_fail (assertion=0x81ce39 "msgr->lock.is_locked()", file=0x81c912 "msg/SimpleMessenger.cc", line=1367, func=0x81ee60 "void SimpleMessenger::Pipe::unregister_pipe()") at common/assert.cc:75 #11 0x00000000006719b4 in SimpleMessenger::Pipe::unregister_pipe (this=0x15969780) at msg/SimpleMessenger.cc:1367 #12 0x000000000067ffe1 in SimpleMessenger::submit_message (this=0xf7fb00, m=0x15fef540, pipe=0x15969780) at msg/SimpleMessenger.cc:2481 #13 0x0000000000680366 in SimpleMessenger::send_message (this=0xf7fb00, m=0x15fef540, con=0x1582aa00) at msg/SimpleMessenger.cc:462 #14 0x00000000005ab8ed in OSD::heartbeat (this=0x100b000) at osd/OSD.cc:1669 #15 0x00000000005ac715 in OSD::heartbeat_entry (this=0x100b000) at osd/OSD.cc:1589 #16 0x00000000005e29fd in OSD::T_Heartbeat::entry (this=<optimized out>) at osd/OSD.h:286 #17 0x00007f8911583efc in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #18 0x00007f890fbb489d in clone () from /lib/x86_64-linux-gnu/libc.so.6 #19 0x0000000000000000 in ?? () (gdb)
The last log lines before it went down:
2012-03-26 14:07:44.685855 7f89040a2700 -- [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6822/1720 --> osd.38 [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:6822/4886 -- osd_map(7049..7049 src has 434..7049) v3 -- ?+0 0xd078400 2012-03-26 14:07:44.685979 7f89040a2700 -- [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6822/1720 --> osd.38 [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:6822/4886 -- pg_notify(0.704,1.703,2.702,2.b7,0.562,2.560,1.561,0.52e,2.52c,1.52d,0.874,1.873,2.872 epoch 7049 query_epoch 7049) v2 -- ?+0 0x157dbc40 2012-03-26 14:07:44.686071 7f89040a2700 -- [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6822/1720 --> osd.39 [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:6808/28915 -- pg_notify(0.1d5,1.1d4,2.1d3,0.778,1.777,2.776,0.119,1.118,2.117,0.6d8,1.6d7,2.6d6,0.5cf,1.5ce,2.5cd,1.596,0.597,2.595,1.908,0.909,2.907,0.8be,2.8bc,1.8bd epoch 7049 query_epoch 7049) v2 -- ?+0 0x6ec0c40 2012-03-26 14:07:44.725113 7f89030a0700 -- [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6823/1720 <== osd.4 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:0/1652 6 ==== osd_ping(ping e0 stamp 2012-03-26 14:07:44.723618) v2 ==== 47+0+0 (1046436517 0 0) 0x202a6000 con 0x20771640 msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::unregister_pipe()' thread 7f88fef97700 time 2012-03-26 14:07:44.729334 msg/SimpleMessenger.cc: 1367: FAILED assert(msgr->lock.is_locked()) ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf) 1: (SimpleMessenger::Pipe::unregister_pipe()+0x294) [0x6719b4] 2: (SimpleMessenger::submit_message(Message*, SimpleMessenger::Pipe*)+0x3d1) [0x67ffe1] 3: (SimpleMessenger::send_message(Message*, Connection*)+0x1c6) [0x680366] 4: (OSD::heartbeat()+0x16d) [0x5ab8ed] 5: (OSD::heartbeat_entry()+0x45) [0x5ac715] 6: (OSD::T_Heartbeat::entry()+0xd) [0x5e29fd] 7: (()+0x7efc) [0x7f8911583efc] 8: (clone()+0x6d) [0x7f890fbb489d] ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf) 1: (SimpleMessenger::Pipe::unregister_pipe()+0x294) [0x6719b4] 2: (SimpleMessenger::submit_message(Message*, SimpleMessenger::Pipe*)+0x3d1) [0x67ffe1] 3: (SimpleMessenger::send_message(Message*, Connection*)+0x1c6) [0x680366] 4: (OSD::heartbeat()+0x16d) [0x5ab8ed] 5: (OSD::heartbeat_entry()+0x45) [0x5ac715] 6: (OSD::T_Heartbeat::entry()+0xd) [0x5e29fd] 7: (()+0x7efc) [0x7f8911583efc] 8: (clone()+0x6d) [0x7f890fbb489d] *** Caught signal (Aborted) ** in thread 7f88fef97700 ceph version 0.44-59-g955b1cc (commit:955b1ccd0ddda378ed752a9f2731495b235209cf) 1: /usr/bin/ceph-osd() [0x6fb6e6] 2: (()+0x10060) [0x7f891158c060] 3: (gsignal()+0x35) [0x7f890fb093a5] 4: (abort()+0x17b) [0x7f890fb0cb0b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f89103c7d7d] 6: (()+0xb9f26) [0x7f89103c5f26] 7: (()+0xb9f53) [0x7f89103c5f53] 8: (()+0xba04e) [0x7f89103c604e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x200) [0x690540] 10: (SimpleMessenger::Pipe::unregister_pipe()+0x294) [0x6719b4] 11: (SimpleMessenger::submit_message(Message*, SimpleMessenger::Pipe*)+0x3d1) [0x67ffe1] 12: (SimpleMessenger::send_message(Message*, Connection*)+0x1c6) [0x680366] 13: (OSD::heartbeat()+0x16d) [0x5ab8ed] 14: (OSD::heartbeat_entry()+0x45) [0x5ac715] 15: (OSD::T_Heartbeat::entry()+0xd) [0x5e29fd] 16: (()+0x7efc) [0x7f8911583efc] 17: (clone()+0x6d) [0x7f890fbb489d]
Debugging wasn't high enough though, I'll try to increase it a bit.
Associated revisions
osd: send pings from hbin
Fixes: #2212
Signed-off-by: Sage Weil <sage@newdream.net>
History
#1 Updated by Sage Weil about 12 years ago
- Status changed from New to Resolved
ah, i was using wrong msgr, fixing!