Project

General

Profile

Bug #2462

osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)

Added by Eric Dold almost 12 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2012-05-23 06:16:37.080317 7f18f6012700 -1 osd/PG.cc: In function 'void PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)' thread 7f18f6012700 time 2012-05-23 06:16:36.911866
osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)

ceph version 0.47.1 (f5a9404445e2ed5ec2ee828aa53d73d4a002f7a5)
1: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)+0x1987) [0x614437]
2: (PG::RecoveryState::Stray::react(PG::RecoveryState::MLogRec const&)+0x14d) [0x61d9ad]
3: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x11e) [0x64f91e]
4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x64450b]
5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x11) [0x644631]
6: (PG::RecoveryState::handle_log(int, MOSDPGLog*, PG::RecoveryCtx*)+0x19e) [0x61cdce]
7: (OSD::handle_pg_log(std::tr1::shared_ptr<OpRequest>)+0x5e7) [0x5dafa7]
8: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x220) [0x5ddda0]
9: (OSD::_dispatch(Message*)+0x191) [0x5e40b1]
10: (OSD::ms_dispatch(Message*)+0x153) [0x5e4a03]
11: (SimpleMessenger::dispatch_entry()+0x85b) [0x783c4b]
12: (SimpleMessenger::DispatchThread::entry()+0xd) [0x74b52d]
13: (()+0x8ec6) [0x7f190367fec6]
14: (clone()+0x6d) [0x7f190232c51d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

osd.9.log View (29.5 KB) Eric Dold, 05/23/2012 01:50 AM

History

#1 Updated by Sage Weil almost 12 years ago

  • Priority changed from Normal to High

#2 Updated by Sage Weil almost 12 years ago

  • Priority changed from High to Urgent

#3 Updated by Samuel Just almost 12 years ago

  • Status changed from New to Need More Info

f822c0257e4c7fad181332cd149205ad15a8b9db

See the commit description. Unfortunately, I don't really have evidence that that mechanism actually caused this crash. If it happens again, osd logging from the crashing osd and, more importantly, the primary would allow me to confirm it.

#4 Updated by Sage Weil almost 12 years ago

  • Status changed from Need More Info to Resolved

I'm going to optimistically call this resolved. If we see this crash again, though, we'll need to reopen, and hopefully gather more evidence to come up with another theory! The above commit fixed a bug, at least...

#5 Updated by Sage Weil over 11 years ago

  • Status changed from Resolved to 12

just swa this on congress during a huge crush restructure:

2012-07-27 18:32:04.212775 7fb0e8a40700 -1 osd/PG.cc: In function 'void PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)' thread 7fb0e8a40700 time 2012-07-27 18:32:04.145215
osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)

 ceph version 0.48argonaut-54-g9db7809 (commit:9db78090451e609e3520ac3e57a5f53da03f9ee2)
 1: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)+0x1941) [0x6119a1]
 2: (PG::RecoveryState::Stray::react(PG::RecoveryState::MLogRec const&)+0x140) [0x6329f0]
 3: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x134) [0x64d244]
 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x64208b]
 5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x11) [0x642121]
 6: (PG::RecoveryState::handle_log(int, MOSDPGLog*, PG::RecoveryCtx*)+0x19e) [0x61973e]
 7: (OSD::handle_pg_log(std::tr1::shared_ptr<OpRequest>)+0x63e) [0x5d60ee]
 8: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x220) [0x5d8f60]
 9: (OSD::_dispatch(Message*)+0x191) [0x5e14e1]
 10: (OSD::ms_dispatch(Message*)+0x153) [0x5e1de3]
 11: (SimpleMessenger::dispatch_entry()+0x92b) [0x78ff5b]
 12: (SimpleMessenger::DispatchThread::entry()+0xd) [0x74fdad]
 13: (()+0x7e9a) [0x7fb0f67afe9a]
 14: (clone()+0x6d) [0x7fb0f4d644bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

osd.116

#6 Updated by Sage Weil over 11 years ago

  • Priority changed from Urgent to High

#7 Updated by Samuel Just over 11 years ago

  • Status changed from 12 to Resolved

Also available in: Atom PDF