Project

General

Profile

Bug #1187

OSD: OSDMap::decode

Added by Wido den Hollander about 8 years ago. Updated almost 8 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
Start date:
06/14/2011
Due date:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

During #1186 I did notice one OSD crash, the backtrace gave me:

#0  0x00007fb865dc87bb in raise () from /lib/libpthread.so.0
#1  0x000000000063bd53 in reraise_fatal (signum=3484) at common/signal.cc:61
#2  0x000000000063ce6b in handle_fatal_signal (signum=6) at common/signal.cc:108
#3  <signal handler called>
#4  0x00007fb864998a75 in raise () from /lib/libc.so.6
#5  0x00007fb86499c5c0 in abort () from /lib/libc.so.6
#6  0x00007fb86524e8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#7  0x00007fb86524cd16 in ?? () from /usr/lib/libstdc++.so.6
#8  0x00007fb86524cd43 in std::terminate() () from /usr/lib/libstdc++.so.6
#9  0x00007fb86524ce3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#10 0x0000000000491686 in ceph::buffer::list::iterator::advance (this=0x7fb85a735230, len=2, dest=0x7fb85a7352ec "\001") at ./include/buffer.h:334
#11 ceph::buffer::list::iterator::copy (this=0x7fb85a735230, len=2, dest=0x7fb85a7352ec "\001") at ./include/buffer.h:388
#12 0x000000000054bd62 in OSDMap::decode(ceph::buffer::list&) ()
#13 0x0000000000513494 in OSD::get_map (this=0x2872000, epoch=25743) at osd/OSD.cc:3588
#14 0x000000000057447a in PG::generate_past_intervals (this=0x18df4000) at osd/PG.cc:930
#15 0x000000000058cf65 in GetInfo (this=0x32b2e600, ctx=<value optimized out>) at osd/PG.cc:4368
#16 0x00000000005a38ed in boost::statechart::state<PG::RecoveryState::Peering, PG::RecoveryState::Primary, PG::RecoveryState::GetInfo, (boost::statechart::history_mode)0>::deep_construct(boost::intrusive_ptr<PG::RecoveryState::Primary> const&, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>&) ()
#17 0x00000000005a3c3a in boost::statechart::detail::inner_constructor<boost::mpl::l_item<mpl_::long_<1l>, PG::RecoveryState::Primary, boost::mpl::l_end>, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator> >::construct(boost::intrusive_ptr<PG::RecoveryState::Started> const&, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>&) ()
#18 0x00000000005a3d6f in boost::statechart::simple_state<PG::RecoveryState::Start, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*) ()
#19 0x000000000059b26f in boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&) ()
#20 0x000000000055f10f in PG::RecoveryState::handle_activate_map (this=0x18df4660, rctx=0x7fb85a7360b0) at osd/PG.cc:4749
#21 0x000000000050d678 in PG::handle_activate_map (this=0x2872000, t=<value optimized out>, tfin=<value optimized out>) at ./osd/PG.h:1599
#22 OSD::activate_map (this=0x2872000, t=<value optimized out>, tfin=<value optimized out>) at osd/OSD.cc:3463
#23 0x0000000000519634 in OSD::handle_osd_map (this=0x2872000, m=0x1cb08800) at osd/OSD.cc:3201
#24 0x0000000000528e38 in OSD::_dispatch (this=0x2872000, m=0x1cb08800) at osd/OSD.cc:2693
#25 0x0000000000529af7 in OSD::ms_dispatch (this=0x2872000, m=0x1cb08800) at osd/OSD.cc:2567
#26 0x0000000000615783 in Messenger::ms_deliver_dispatch (this=0x286d000) at msg/Messenger.h:100
#27 SimpleMessenger::dispatch_entry (this=0x286d000) at msg/SimpleMessenger.cc:353
#28 0x0000000000490aec in SimpleMessenger::DispatchThread::entry (this=0x286d490) at msg/SimpleMessenger.h:546
#29 0x00007fb865dbf9ca in start_thread () from /lib/libpthread.so.0
#30 0x00007fb864a4b70d in clone () from /lib/libc.so.6
#31 0x0000000000000000 in ?? ()

I collected the full log and stored it at ''noisy.ceph.widodh.nl:/var/log/remote'', logfile is named ''osd.17.crash.log'', the core-dump can be found at ''/var/log/remote/core-dump'' and is called ''core.atom4.3484''

The core is also still present on atom4.

The version I'm running is d2b7e291f21928f9f0a3e23fb32c94c9cbbc8984

History

#1 Updated by Sage Weil almost 8 years ago

  • Target version set to v0.31

#2 Updated by Sage Weil almost 8 years ago

  • Status changed from New to Can't reproduce

It looks like the cluster has been rebuilt since then? Epoch 25743 (that it couldn't get) is >> the current 912. Or there is a problem with the generate_past_intervals... Unfortunately the core file no longer matches the installed binary either. Sorry I took so long to get to this.. the trail has gone cold!

#3 Updated by Wido den Hollander almost 8 years ago

Oh yes, I had to rebuild since I screwed up my single monitor...

Also available in: Atom PDF