Project

General

Profile

Actions

Bug #1058

closed

pg_refactor: OSD crash when marking several out

Added by Greg Farnum almost 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

To reproduce:
1) Start up with 4 OSDs (using vstart)
2) mark two of the OSDs out
3) Wait a bit
4) Notice that the OSDs that should have remained up have crashed with this backtrace:

*** Caught signal (Aborted) **
 in thread 0x7f813b477710
 ceph version 0.27-260-g228e857 (commit:228e857eea18ee13bfc3024aaef5f79d0598d4bc)
 1: ./cosd() [0x64b132]
 2: (()+0xef60) [0x7f8145aebf60]
 3: (gsignal()+0x35) [0x7f81446d4165]
 4: (abort()+0x180) [0x7f81446d6f70]
 5: (__assert_fail()+0xf1) [0x7f81446cd2b1]
 6: (PG::RecoveryState::Active::Active(boost::statechart::state<PG::RecoveryState::Active, PG::RecoveryState::Primary, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0x3bd) [0x58d0ed]
 7: (boost::statechart::detail::safe_reaction_result boost::statechart::simple_state<PG::RecoveryState::Peering, PG::RecoveryState::Primary, PG::RecoveryState::GetInfo, (boost::statechart::history_mode)0>::transit_impl<PG::RecoveryState::Active, PG::RecoveryState::RecoveryMachine, boost::statechart::detail::no_transition_function>(boost::statechart::detail::no_transition_function const&)+0xc7) [0x5a4517]
 8: (boost::statechart::simple_state<PG::RecoveryState::Peering, PG::RecoveryState::Primary, PG::RecoveryState::GetInfo, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xca) [0x5a48ca]
 9: (boost::statechart::simple_state<PG::RecoveryState::GetMissing, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x62) [0x5a3b12]
 10: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x16b) [0x5a268b]
 11: (PG::RecoveryState::handle_create(PG::RecoveryCtx*)+0x3a) [0x58165a]
 12: (OSD::get_or_create_pg(PG::Info const&, unsigned int, int, int&, bool, ObjectStore::Transaction**, C_Contexts**)+0x4d2) [0x527382]
 13: (OSD::handle_pg_notify(MOSDPGNotify*)+0x18c) [0x52816c]
 14: (OSD::_dispatch(Message*)+0x44f) [0x53038f]
 15: (OSD::ms_dispatch(Message*)+0xc1) [0x530e61]
 16: (SimpleMessenger::dispatch_entry()+0x7da) [0x493b3a]
 17: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x488a1c]
 18: (()+0x68ba) [0x7f8145ae38ba]
 19: (clone()+0x6d) [0x7f814477102d]
Actions #1

Updated by Greg Farnum almost 13 years ago

So the problem is that OSD::get_or_create_pg calls pg->handle_create using an rctx that is empty except for an empty query_map. Then the state transitions kick in and end up in PG::RecoveryState::Active::Active, where it expects the rctx to be full of data.

I pushed a patch in 4d10126b20522f596d6fe30c0908adab2abebf31 which I think works, but I can't be certain as with or without my patch I can no longer get past #1062.

Actions #2

Updated by Greg Farnum almost 13 years ago

  • Status changed from In Progress to Resolved

Haven't seen a recurrence of this.

Actions

Also available in: Atom PDF