Project

General

Profile

Actions

Bug #12179

closed

osd: PG.cc: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)

Added by Joao Eduardo Luis almost 9 years ago. Updated over 8 years ago.

Status:
Duplicate
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

--- begin dump of recent events ---
  -399> 2015-06-25 19:59:10.323481 7efe9bc4c700 -1 osd/PG.cc: In function 'void PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)' thread 7efe9bc4c700 time 2015-06-25 19:59:10.294765
osd/PG.cc: 292: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)

 ceph version 9.0.1-1156-gb9c72eb (b9c72eb699cce494bb786646f5183c25fcf57545)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xafea2f]
 2: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)+0x4fa) [0x79ffda]
 3: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0x2ad) [0x7c23bd]
 4: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<PG::RecoveryState::GotLog>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::transition<PG::RecoveryState::IsIncomplete, PG::RecoveryState::Incomplete, boost::statechart::detail::no_context<PG::RecoveryState::IsIncomplete>, &(boost::statechart::detail::no_context<PG::RecoveryState::IsIncomplete>::no_function(PG::RecoveryState::IsIncomplete const&))> >, boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x104) [0x7f7a64]
 5: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x8d) [0x7f7bad]
 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x51) [0x7d5c21]
 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x2b) [0x7d5dfb]
 8: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x303) [0x78aa73]
 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x260) [0x672280]
 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6cada2]
 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xaed90e]
 12: (ThreadPool::WorkThread::entry()+0x10) [0xaf0770]
 13: (()+0x7e9a) [0x7efeb51a5e9a]
 14: (clone()+0x6d) [0x7efeb394e3fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

full log at teuthology:/home/ubuntu/joao/joao-2015-06-25_17:58:/ceph-osd.3.log.gz


Related issues 1 (1 open0 closed)

Is duplicate of RADOS - Bug #12687: osd thrashing + pg import/export can cause maybe_went_rw intervals to be missedNew

Actions
Actions #1

Updated by Samuel Just almost 9 years ago

osd.5 had a history.les of 810 on 1.5b which it got from a ceph_objectstore_tool tranfer from osd.3. Peering didn't find anything with that les since the tool removed the copy from osd.3, which had been the only osd active at that time. Bottom line: objectstore tool here violated the peering rules and that unsurprisingly caused a faulty peering process. Will look more on monday.

Actions #2

Updated by Samuel Just almost 9 years ago

  • Priority changed from Urgent to High
Actions #3

Updated by Samuel Just over 8 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF