Project

General

Profile

Actions

Bug #1958

closed

osd: crash during peering due to receiving an info msg in WaitActingChange

Added by Josh Durgin over 12 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This happened during a teuthology run with thrashing and reads/writes/deletes.
Logs are in vit:~joshd/bug_1958

2012-01-20 15:18:26.757104 11c4f700 osd.1 91 pg[0.f( v 21'319 (21'118,21'319] n=4 ec=1 les/c 65/65 81/85/3) [1,2]/[1,2,0] r=0 lpr=85 bft=2 mlcod 0'0 active] log audit: log(21'118,21'319]  handle_in
fo 0.f( v 21'319 (21'119,21'319] n=4 ec=1 les/c 85/65 81/85/3) from osd.0
2012-01-20 15:18:26.759722 11c4f700 osd.1 91 pg[0.f( v 21'319 (21'118,21'319] n=4 ec=1 les/c 65/65 81/85/3) [1,2]/[1,2,0] r=0 lpr=85 bft=2 mlcod 0'0 active] log audit: log(21'118,21'319]  exit Star
ted/Primary/Peering/WaitActingChange 0.008219 1 0.011454
2012-01-20 15:18:26.760536 11c4f700 osd.1 91 pg[0.f( v 21'319 (21'118,21'319] n=4 ec=1 les/c 65/65 81/85/3) [1,2]/[1,2,0] r=0 lpr=85 bft=2 mlcod 0'0 active] log audit: log(21'118,21'319]  exit Star
ted/Primary 0.009844 0 0.000000
2012-01-20 15:18:26.761300 11c4f700 osd.1 91 pg[0.f( v 21'319 (21'118,21'319] n=4 ec=1 les/c 65/65 81/85/3) [1,2]/[1,2,0] r=0 lpr=85 bft=2 mlcod 0'0 active] log audit: log(21'118,21'319]  exit Star
ted 35.573870 0 0.000000
2012-01-20 15:18:26.764141 11c4f700 osd.1 91 pg[0.f( v 21'319 (21'118,21'319] n=4 ec=1 les/c 65/65 81/85/3) [1,2]/[1,2,0] r=0 lpr=85 bft=2 mlcod 0'0 active] log audit: log(21'118,21'319]  enter Cra
shed
osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0u>::my_context)', in thread '11c4f700'
osd/PG.cc: 3744: FAILED assert(0 == "we got a bad state machine event")
 ceph version 0.40-185-g75004db (commit:75004dbe4063baf8211b41e2da45d8bb7861e1f6)
 1: (PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0xfd) [0x662b1d]
 2: (boost::statechart::detail::inner_constructor<boost::mpl::l_item<mpl_::long_<1l>, PG::RecoveryState::Crashed, boost::mpl::l_end>, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator> >::construct(boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>* const&, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>&)+0x26) [0x6a2236]
 3: (boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xd4) [0x6a2eb4]
 4: (boost::statechart::simple_state<PG::RecoveryState::Primary, PG::RecoveryState::Started, PG::RecoveryState::Peering, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xa1) [0x6a9631]
 5: (boost::statechart::simple_state<PG::RecoveryState::WaitActingChange, PG::RecoveryState::Primary, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x66) [0x6a24d6]
 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x6a3e3b]
 7: (PG::RecoveryState::handle_info(int, PG::Info&, PG::RecoveryCtx*)+0x157) [0x676c27]
 8: (OSD::handle_pg_info(MOSDPGInfo*)+0x468) [0x55b308]
 9: (OSD::_dispatch(Message*)+0x5fd) [0x56b1fd]
 10: (OSD::ms_dispatch(Message*)+0x19f) [0x56c03f]
 11: (SimpleMessenger::dispatch_entry()+0x883) [0x5b7863]
 12: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4a31dc]
 13: (()+0x7971) [0x4e35971]
 14: (clone()+0x6d) [0x659d92d]
Actions #1

Updated by Sage Weil over 12 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Sage Weil over 12 years ago

  • Status changed from New to 4
  • Assignee set to Sage Weil

fix pushed to commit:2f6205e57c7b8a21da72f0af8f1edd38a5989149

Actions #3

Updated by Sage Weil over 12 years ago

  • Status changed from 4 to Resolved
Actions

Also available in: Atom PDF