Bug #2462
osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)
0%
Description
2012-05-23 06:16:37.080317 7f18f6012700 -1 osd/PG.cc: In function 'void PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)' thread 7f18f6012700 time 2012-05-23 06:16:36.911866
osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)
ceph version 0.47.1 (f5a9404445e2ed5ec2ee828aa53d73d4a002f7a5)
1: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)+0x1987) [0x614437]
2: (PG::RecoveryState::Stray::react(PG::RecoveryState::MLogRec const&)+0x14d) [0x61d9ad]
3: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x11e) [0x64f91e]
4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x64450b]
5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x11) [0x644631]
6: (PG::RecoveryState::handle_log(int, MOSDPGLog*, PG::RecoveryCtx*)+0x19e) [0x61cdce]
7: (OSD::handle_pg_log(std::tr1::shared_ptr<OpRequest>)+0x5e7) [0x5dafa7]
8: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x220) [0x5ddda0]
9: (OSD::_dispatch(Message*)+0x191) [0x5e40b1]
10: (OSD::ms_dispatch(Message*)+0x153) [0x5e4a03]
11: (SimpleMessenger::dispatch_entry()+0x85b) [0x783c4b]
12: (SimpleMessenger::DispatchThread::entry()+0xd) [0x74b52d]
13: (()+0x8ec6) [0x7f190367fec6]
14: (clone()+0x6d) [0x7f190232c51d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
History
#1 Updated by Sage Weil over 11 years ago
- Priority changed from Normal to High
#2 Updated by Sage Weil over 11 years ago
- Priority changed from High to Urgent
#3 Updated by Samuel Just over 11 years ago
- Status changed from New to Need More Info
f822c0257e4c7fad181332cd149205ad15a8b9db
See the commit description. Unfortunately, I don't really have evidence that that mechanism actually caused this crash. If it happens again, osd logging from the crashing osd and, more importantly, the primary would allow me to confirm it.
#4 Updated by Sage Weil over 11 years ago
- Status changed from Need More Info to Resolved
I'm going to optimistically call this resolved. If we see this crash again, though, we'll need to reopen, and hopefully gather more evidence to come up with another theory! The above commit fixed a bug, at least...
#5 Updated by Sage Weil about 11 years ago
- Status changed from Resolved to 12
just swa this on congress during a huge crush restructure:
2012-07-27 18:32:04.212775 7fb0e8a40700 -1 osd/PG.cc: In function 'void PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)' thread 7fb0e8a40700 time 2012-07-27 18:32:04.145215 osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail) ceph version 0.48argonaut-54-g9db7809 (commit:9db78090451e609e3520ac3e57a5f53da03f9ee2) 1: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)+0x1941) [0x6119a1] 2: (PG::RecoveryState::Stray::react(PG::RecoveryState::MLogRec const&)+0x140) [0x6329f0] 3: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x134) [0x64d244] 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x64208b] 5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x11) [0x642121] 6: (PG::RecoveryState::handle_log(int, MOSDPGLog*, PG::RecoveryCtx*)+0x19e) [0x61973e] 7: (OSD::handle_pg_log(std::tr1::shared_ptr<OpRequest>)+0x63e) [0x5d60ee] 8: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x220) [0x5d8f60] 9: (OSD::_dispatch(Message*)+0x191) [0x5e14e1] 10: (OSD::ms_dispatch(Message*)+0x153) [0x5e1de3] 11: (SimpleMessenger::dispatch_entry()+0x92b) [0x78ff5b] 12: (SimpleMessenger::DispatchThread::entry()+0xd) [0x74fdad] 13: (()+0x7e9a) [0x7fb0f67afe9a] 14: (clone()+0x6d) [0x7fb0f4d644bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. osd.116
#6 Updated by Sage Weil about 11 years ago
- Priority changed from Urgent to High
#7 Updated by Samuel Just almost 11 years ago
- Status changed from 12 to Resolved