Project

General

Profile

Actions

Bug #4572

closed

osd crash with: 0 == "we got a bad state machine event"

Added by Wido den Hollander about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This seems like #4042 but the backtrace seems different.

After resolving #4556 I tried to recover the cluster, but in the end 18 out of the 40 OSDs survived and were running.

The other 22 seem to have crash with almost similair backtraces.

For example osd.2:

(gdb) bt
#0  0x00007f2fa5229b7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000000000078910e in reraise_fatal (signum=6) at global/signal_handler.cc:58
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:104
#3  <signal handler called>
#4  0x00007f2fa3be8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f2fa3bebb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f2fa453a69d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f2fa4538846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f2fa4538873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f2fa453896e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00000000008343af in ceph::__ceph_assert_fail (assertion=0x9123b0 "0 == \"we got a bad state machine event\"", file=<optimized out>, line=5250, 
    func=0x916ca0 "PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)") at common/assert.cc:77
#11 0x000000000068866b in PG::RecoveryState::Crashed::Crashed (this=0x37b59b0, ctx=...) at osd/PG.cc:5250
#12 0x00000000006b4496 in shallow_construct (outermostContextBase=..., pContext=<optimized out>) at /usr/include/boost/statechart/state.hpp:89
#13 deep_construct (outermostContextBase=..., pContext=<optimized out>) at /usr/include/boost/statechart/state.hpp:79
#14 boost::statechart::detail::inner_constructor<boost::mpl::l_item<mpl_::long_<1l>, PG::RecoveryState::Crashed, boost::mpl::l_end>, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator> >::construct (outermostContextBase=..., pContext=<optimized out>)
    at /usr/include/boost/statechart/detail/constructor.hpp:93
#15 0x00000000006d2643 in transit_impl<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::statechart::detail::no_transition_function> (this=0x37b59b0, transitionAction=...)
    at /usr/include/boost/statechart/simple_state.hpp:798
#16 transit<PG::RecoveryState::Crashed> (this=0x37b59b0) at /usr/include/boost/statechart/simple_state.hpp:314
#17 react_without_action (stt=...) at /usr/include/boost/statechart/transition.hpp:38
#18 react (stt=...) at /usr/include/boost/statechart/detail/reaction_dispatcher.hpp:47
#19 react (stt=..., evt=...) at /usr/include/boost/statechart/detail/reaction_dispatcher.hpp:68
#20 react (stt=..., evt=..., eventType=<optimized out>) at /usr/include/boost/statechart/detail/reaction_dispatcher.hpp:109
#21 react<PG::RecoveryState::Reset, boost::statechart::event_base, void const*> (stt=..., evt=..., eventType=<optimized out>) at /usr/include/boost/statechart/transition.hpp:59
#22 local_react_impl<boost::mpl::list1<boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., evt=..., eventType=<optimized out>) at /usr/include/boost/statechart/simple_state.hpp:816
#23 local_react<boost::mpl::list1<boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (this=0x37b59b0, evt=..., eventType=<optimized out>)
    at /usr/include/boost/statechart/simple_state.hpp:851
#24 boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list2<boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., evt=..., eventType=<optimized out>) at /usr/include/boost/statechart/simple_state.hpp:820
#25 0x00000000006d2794 in local_react<boost::mpl::list2<boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (
    eventType=0xc2b5c0, evt=..., this=0x37b59b0) at /usr/include/boost/statechart/simple_state.hpp:851
#26 local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (eventType=0xc2b5c0, evt=..., 
    stt=...) at /usr/include/boost/statechart/simple_state.hpp:820
#27 local_react<boost::mpl::list3<boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (eventType=0xc2b5c0, evt=..., this=0x37b59b0) at /usr/include/boost/statechart/simple_state.hpp:851
#28 local_react_impl<boost::mpl::list4<boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., eventType=0xc2b5c0, evt=...) at /usr/include/boost/statechart/simple_state.hpp:820
#29 local_react<boost::mpl::list4<boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (eventType=0xc2b5c0, evt=..., this=0x37b59b0) at /usr/include/boost/statechart/simple_state.hpp:851
#30 boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list5<boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::st---Type <return> to continue, or q <return> to quit---
atechart::event_base>::no_function> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., evt=..., 
    eventType=0xc2b5c0) at /usr/include/boost/statechart/simple_state.hpp:820
#31 0x00000000006d28ce in local_react<boost::mpl::list5<boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (eventType=0xc2b5c0, evt=..., this=0x37b59b0)
    at /usr/include/boost/statechart/simple_state.hpp:851
#32 local_react_impl<boost::mpl::list<boost::statechart::custom_reaction<PG::QueryState>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., eventType=0xc2b5c0, evt=...)
    at /usr/include/boost/statechart/simple_state.hpp:820
#33 local_react<boost::mpl::list<boost::statechart::custom_reaction<PG::QueryState>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed> > > (eventType=0xc2b5c0, evt=..., 
    this=0x37b59b0) at /usr/include/boost/statechart/simple_state.hpp:851
#34 boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl (this=0x37b59b0, evt=..., eventType=0xc2b5c0)
    at /usr/include/boost/statechart/simple_state.hpp:489
#35 0x00000000006bb03b in operator() (this=<synthetic pointer>) at /usr/include/boost/statechart/state_machine.hpp:87
#36 operator()<boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, const void*>, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial>::exception_event_handler> (action=..., this=<optimized out>)
    at /usr/include/boost/statechart/null_exception_translator.hpp:33
#37 boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event (this=0x37c6768, 
    evt=...) at /usr/include/boost/statechart/state_machine.hpp:885
#38 0x00000000006bb311 in boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event (this=0x37c6768, evt=...) at /usr/include/boost/statechart/state_machine.hpp:275

#39 0x000000000067b3f7 in handle_event (rctx=0x7f2f93c008e0, evt=<error reading variable: access outside bounds of object referenced via synthetic pointer>, this=0x37c6768) at osd/PG.h:1717
#40 PG::handle_peering_event (this=0x37c5400, evt=..., rctx=0x7f2f93c008e0) at osd/PG.cc:5114
#41 0x0000000000625118 in OSD::process_peering_events (this=0x2ffc000, pgs=..., handle=...) at osd/OSD.cc:6230
#42 0x000000000065b5d0 in OSD::PeeringWQ::_process (this=<optimized out>, pgs=..., handle=...) at osd/OSD.h:748
#43 0x00000000008297e6 in ThreadPool::worker (this=0x2ffc458, wt=0x9e67c20) at common/WorkQueue.cc:119
#44 0x000000000082b610 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:316
#45 0x00007f2fa5221e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#46 0x00007f2fa3ca5cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#47 0x0000000000000000 in ?? ()
(gdb)

I'm running 0.56.4 and it seems the fix for #4202 went into the bobtail branch, so that shouldn't be the case.

I'll upload the logs of osd.2 and osd.7 to the cephdrop sftp account since they are both quite big (~150MB each).

I tried to start the OSDs and some of them survived. By starting them host by host I'm now at 21/40 and I'll continue to start OSDs.

Actions

Also available in: Atom PDF