Project

General

Profile

Actions

Bug #2462

closed

osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)

Added by Eric Dold almost 12 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2012-05-23 06:16:37.080317 7f18f6012700 -1 osd/PG.cc: In function 'void PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)' thread 7f18f6012700 time 2012-05-23 06:16:36.911866
osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)

ceph version 0.47.1 (f5a9404445e2ed5ec2ee828aa53d73d4a002f7a5)
1: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)+0x1987) [0x614437]
2: (PG::RecoveryState::Stray::react(PG::RecoveryState::MLogRec const&)+0x14d) [0x61d9ad]
3: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x11e) [0x64f91e]
4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x64450b]
5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x11) [0x644631]
6: (PG::RecoveryState::handle_log(int, MOSDPGLog*, PG::RecoveryCtx*)+0x19e) [0x61cdce]
7: (OSD::handle_pg_log(std::tr1::shared_ptr<OpRequest>)+0x5e7) [0x5dafa7]
8: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x220) [0x5ddda0]
9: (OSD::_dispatch(Message*)+0x191) [0x5e40b1]
10: (OSD::ms_dispatch(Message*)+0x153) [0x5e4a03]
11: (SimpleMessenger::dispatch_entry()+0x85b) [0x783c4b]
12: (SimpleMessenger::DispatchThread::entry()+0xd) [0x74b52d]
13: (()+0x8ec6) [0x7f190367fec6]
14: (clone()+0x6d) [0x7f190232c51d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Files

osd.9.log (29.5 KB) osd.9.log Eric Dold, 05/23/2012 01:50 AM
Actions #1

Updated by Sage Weil almost 12 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Sage Weil almost 12 years ago

  • Priority changed from High to Urgent
Actions #3

Updated by Samuel Just almost 12 years ago

  • Status changed from New to Need More Info

f822c0257e4c7fad181332cd149205ad15a8b9db

See the commit description. Unfortunately, I don't really have evidence that that mechanism actually caused this crash. If it happens again, osd logging from the crashing osd and, more importantly, the primary would allow me to confirm it.

Actions #4

Updated by Sage Weil almost 12 years ago

  • Status changed from Need More Info to Resolved

I'm going to optimistically call this resolved. If we see this crash again, though, we'll need to reopen, and hopefully gather more evidence to come up with another theory! The above commit fixed a bug, at least...

Actions #5

Updated by Sage Weil over 11 years ago

  • Status changed from Resolved to 12

just swa this on congress during a huge crush restructure:

2012-07-27 18:32:04.212775 7fb0e8a40700 -1 osd/PG.cc: In function 'void PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)' thread 7fb0e8a40700 time 2012-07-27 18:32:04.145215
osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)

 ceph version 0.48argonaut-54-g9db7809 (commit:9db78090451e609e3520ac3e57a5f53da03f9ee2)
 1: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)+0x1941) [0x6119a1]
 2: (PG::RecoveryState::Stray::react(PG::RecoveryState::MLogRec const&)+0x140) [0x6329f0]
 3: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x134) [0x64d244]
 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x64208b]
 5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x11) [0x642121]
 6: (PG::RecoveryState::handle_log(int, MOSDPGLog*, PG::RecoveryCtx*)+0x19e) [0x61973e]
 7: (OSD::handle_pg_log(std::tr1::shared_ptr<OpRequest>)+0x63e) [0x5d60ee]
 8: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x220) [0x5d8f60]
 9: (OSD::_dispatch(Message*)+0x191) [0x5e14e1]
 10: (OSD::ms_dispatch(Message*)+0x153) [0x5e1de3]
 11: (SimpleMessenger::dispatch_entry()+0x92b) [0x78ff5b]
 12: (SimpleMessenger::DispatchThread::entry()+0xd) [0x74fdad]
 13: (()+0x7e9a) [0x7fb0f67afe9a]
 14: (clone()+0x6d) [0x7fb0f4d644bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

osd.116
Actions #6

Updated by Sage Weil over 11 years ago

  • Priority changed from Urgent to High
Actions #7

Updated by Samuel Just over 11 years ago

  • Status changed from 12 to Resolved
Actions

Also available in: Atom PDF