Project

General

Profile

Actions

Bug #7881

closed

osd/PGLog.cc: 430: FAILED assert(to != olog.log.end() || (olog.head == info.last_update))

Added by Sage Weil about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

mixed dumpling/firefly cluster:

  -485> 2014-03-26 20:45:26.337944 7f9bdc0f4700  1 -- 10.214.131.31:6806/6124 <== osd.2 10.214.132.20:6806/24629 132 ==== pg_log(1.47 epoch 394 query_epoch 394) v4 ==== 736+0+0 (4240938073 0 0) 0x42e3580 con 0x3db49a0
  -476> 2014-03-26 20:45:26.338036 7f9bdc0f4700  7 osd.5 394 handle_pg_log pg_log(1.47 epoch 394 query_epoch 394) v4 from osd.2
  -475> 2014-03-26 20:45:26.338042 7f9bdc0f4700 15 osd.5 394 require_same_or_newer_map 394 (i am 394) 0x42e3580
  -470> 2014-03-26 20:45:26.338660 7f9bd78eb700 10 osd.5 pg_epoch: 394 pg[1.47( v 6'2 lc 6'3 (6'2,6'2] local-les=357 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=1 lpr=393 pi=342-392/2 lcod 0'0 inactive NOTIFY] handle_peering_event: epoch_sent: 394 epoch_requested: 394 MLogRec from 2
  -468> 2014-03-26 20:45:26.338688 7f9bd78eb700 10 osd.5 pg_epoch: 394 pg[1.47( v 6'2 lc 6'3 (6'2,6'2] local-les=357 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=1 lpr=393 pi=342-392/2 lcod 0'0 inactive NOTIFY] state<Started/Stray>: got info+log from osd.2 1.47( v 6'3 (0'0,6'3] local-les=394 n=3 ec=1 les/c 357/357 356/393/356) log(0'0,6'3]
  -466> 2014-03-26 20:45:26.338718 7f9bd78eb700 10 merge_log log(0'0,6'3] from osd.2 into log(6'2,6'2]
  -465> 2014-03-26 20:45:26.338724 7f9bd78eb700 10 merge_log extending tail to 0'0
   -61> 2014-03-26 20:45:26.345061 7f9bd78eb700 -1 osd/PGLog.cc: In function 'void PGLog::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int, pg_info_t&, std::list<hobject_t>&, bool&, bool&)' thread 7f9bd78eb700 time 2014-03-26 20:45:26.338729
osd/PGLog.cc: 430: FAILED assert(to != olog.log.end() || (olog.head == info.last_update))

 ceph version 0.67.7-39-ga0a560a (a0a560a9f04311306a9784fa3c6ea2586d637f56)
 1: (PGLog::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int, pg_info_t&, std::list<hobject_t, std::allocator<hobject_t> >&, bool&, bool&)+0x531) [0x75ab41]
 2: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)+0x51) [0x6f4a51]
 3: (PG::RecoveryState::Stray::react(PG::MLogRec const&)+0x138) [0x72da28]
 4: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x166) [0x750536]
 5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x73971b]
 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x11) [0x739a31]
 7: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x313) [0x6fb123]
 8: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2cb) [0x690ddb]
 9: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6ce8a2]
 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b6536]
 11: (ThreadPool::WorkThread::entry()+0x10) [0x8b8340]
 12: (()+0x7e9a) [0x7f9bea9efe9a]
 13: (clone()+0x6d) [0x7f9be8b3b3fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

the peer is 0.78+ :

2014-03-26 20:45:26.337580 7fb92066b700 10 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=357 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 peering]  peer osd.(5,255) now 1.47( v 6'2 (6'2,6'2] local-les=357 n=3 ec=1 les/c 357/357 356/393/356) missing(0)
2014-03-26 20:45:26.337599 7fb92066b700 10 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=357 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetMissing>: Got last missing, don't need missing posting CheckRepops
2014-03-26 20:45:26.337612 7fb92066b700  5 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=357 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 peering] exit Started/Primary/Peering/GetMissing 0.003727 2 0.000399
2014-03-26 20:45:26.337626 7fb92066b700 10 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=357 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 peering] state<Started/Primary/Peering>: Leaving Peering
2014-03-26 20:45:26.337637 7fb92066b700  5 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=357 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 peering] exit Started/Primary/Peering 0.021395 0 0.000000
2014-03-26 20:45:26.337651 7fb92066b700  5 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=357 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 inactive] enter Started/Primary/Active
2014-03-26 20:45:26.337662 7fb92066b700 10 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=357 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 inactive] state<Started/Primary/Active>: In Active, about to call activate
2014-03-26 20:45:26.337677 7fb92066b700 10 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=394 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 inactive] activate - snap_trimq []
2014-03-26 20:45:26.337688 7fb92066b700 10 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=394 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 inactive] activate - no missing, moving last_complete 6'3 -> 6'3
2014-03-26 20:45:26.337699 7fb92066b700 10 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=394 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 inactive] activate peer osd.(5,255) 1.47( v 6'2 (6'2,6'2] local-les=357 n=3 ec=1 les/c 357/357 356/393/356)
2014-03-26 20:45:26.337714 7fb92066b700 10 osd.2 pg_epoch: 394 pg[1.47( v 6'3 (0'0,6'3] local-les=394 n=3 ec=1 les/c 357/357 356/393/356) [2,5] r=0 lpr=393 pi=356-392/1 crt=6'3 lcod 0'0 mlcod 0'0 inactive] activate peer osd.(5,255) sending log((0'0,6'3], crt=6'3)
2014-03-26 20:45:26.337726 7fb92066b700  1 -- 10.214.132.20:6806/24629 --> osd.5 10.214.131.31:6806/6124 -- pg_log(1.47 epoch 394 log log((0'0,6'3], crt=6'3) query_epoch 394) v4 -- ?+0 0x2849600


ubuntu@teuthology:/a/teuthology-2014-03-25_22:35:27-upgrade:dumpling-x:stress-split-firefly---basic-plana/146654

Actions #1

Updated by Ian Colle about 10 years ago

  • Assignee set to Samuel Just
Actions #2

Updated by Sage Weil about 10 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-03-27_22:35:20-upgrade:dumpling-x:stress-split-firefly---basic-plana/149495

Actions #3

Updated by Samuel Just about 10 years ago

  • Status changed from 12 to In Progress
Actions #4

Updated by Samuel Just about 10 years ago

  • Status changed from In Progress to 7
Actions #5

Updated by Sage Weil about 10 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF