Bug #1187
closedOSD: OSDMap::decode
0%
Description
During #1186 I did notice one OSD crash, the backtrace gave me:
#0 0x00007fb865dc87bb in raise () from /lib/libpthread.so.0 #1 0x000000000063bd53 in reraise_fatal (signum=3484) at common/signal.cc:61 #2 0x000000000063ce6b in handle_fatal_signal (signum=6) at common/signal.cc:108 #3 <signal handler called> #4 0x00007fb864998a75 in raise () from /lib/libc.so.6 #5 0x00007fb86499c5c0 in abort () from /lib/libc.so.6 #6 0x00007fb86524e8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #7 0x00007fb86524cd16 in ?? () from /usr/lib/libstdc++.so.6 #8 0x00007fb86524cd43 in std::terminate() () from /usr/lib/libstdc++.so.6 #9 0x00007fb86524ce3e in __cxa_throw () from /usr/lib/libstdc++.so.6 #10 0x0000000000491686 in ceph::buffer::list::iterator::advance (this=0x7fb85a735230, len=2, dest=0x7fb85a7352ec "\001") at ./include/buffer.h:334 #11 ceph::buffer::list::iterator::copy (this=0x7fb85a735230, len=2, dest=0x7fb85a7352ec "\001") at ./include/buffer.h:388 #12 0x000000000054bd62 in OSDMap::decode(ceph::buffer::list&) () #13 0x0000000000513494 in OSD::get_map (this=0x2872000, epoch=25743) at osd/OSD.cc:3588 #14 0x000000000057447a in PG::generate_past_intervals (this=0x18df4000) at osd/PG.cc:930 #15 0x000000000058cf65 in GetInfo (this=0x32b2e600, ctx=<value optimized out>) at osd/PG.cc:4368 #16 0x00000000005a38ed in boost::statechart::state<PG::RecoveryState::Peering, PG::RecoveryState::Primary, PG::RecoveryState::GetInfo, (boost::statechart::history_mode)0>::deep_construct(boost::intrusive_ptr<PG::RecoveryState::Primary> const&, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>&) () #17 0x00000000005a3c3a in boost::statechart::detail::inner_constructor<boost::mpl::l_item<mpl_::long_<1l>, PG::RecoveryState::Primary, boost::mpl::l_end>, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator> >::construct(boost::intrusive_ptr<PG::RecoveryState::Started> const&, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>&) () #18 0x00000000005a3d6f in boost::statechart::simple_state<PG::RecoveryState::Start, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*) () #19 0x000000000059b26f in boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&) () #20 0x000000000055f10f in PG::RecoveryState::handle_activate_map (this=0x18df4660, rctx=0x7fb85a7360b0) at osd/PG.cc:4749 #21 0x000000000050d678 in PG::handle_activate_map (this=0x2872000, t=<value optimized out>, tfin=<value optimized out>) at ./osd/PG.h:1599 #22 OSD::activate_map (this=0x2872000, t=<value optimized out>, tfin=<value optimized out>) at osd/OSD.cc:3463 #23 0x0000000000519634 in OSD::handle_osd_map (this=0x2872000, m=0x1cb08800) at osd/OSD.cc:3201 #24 0x0000000000528e38 in OSD::_dispatch (this=0x2872000, m=0x1cb08800) at osd/OSD.cc:2693 #25 0x0000000000529af7 in OSD::ms_dispatch (this=0x2872000, m=0x1cb08800) at osd/OSD.cc:2567 #26 0x0000000000615783 in Messenger::ms_deliver_dispatch (this=0x286d000) at msg/Messenger.h:100 #27 SimpleMessenger::dispatch_entry (this=0x286d000) at msg/SimpleMessenger.cc:353 #28 0x0000000000490aec in SimpleMessenger::DispatchThread::entry (this=0x286d490) at msg/SimpleMessenger.h:546 #29 0x00007fb865dbf9ca in start_thread () from /lib/libpthread.so.0 #30 0x00007fb864a4b70d in clone () from /lib/libc.so.6 #31 0x0000000000000000 in ?? ()
I collected the full log and stored it at ''noisy.ceph.widodh.nl:/var/log/remote'', logfile is named ''osd.17.crash.log'', the core-dump can be found at ''/var/log/remote/core-dump'' and is called ''core.atom4.3484''
The core is also still present on atom4.
The version I'm running is d2b7e291f21928f9f0a3e23fb32c94c9cbbc8984
Updated by Sage Weil almost 13 years ago
- Status changed from New to Can't reproduce
It looks like the cluster has been rebuilt since then? Epoch 25743 (that it couldn't get) is >> the current 912. Or there is a problem with the generate_past_intervals... Unfortunately the core file no longer matches the installed binary either. Sorry I took so long to get to this.. the trail has gone cold!
Updated by Wido den Hollander almost 13 years ago
Oh yes, I had to rebuild since I screwed up my single monitor...