Project

General

Profile

Bug #12179

osd: PG.cc: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)

Added by Joao Eduardo Luis over 7 years ago. Updated over 7 years ago.

Status:
Duplicate
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

--- begin dump of recent events ---
  -399> 2015-06-25 19:59:10.323481 7efe9bc4c700 -1 osd/PG.cc: In function 'void PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)' thread 7efe9bc4c700 time 2015-06-25 19:59:10.294765
osd/PG.cc: 292: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)

 ceph version 9.0.1-1156-gb9c72eb (b9c72eb699cce494bb786646f5183c25fcf57545)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xafea2f]
 2: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)+0x4fa) [0x79ffda]
 3: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0x2ad) [0x7c23bd]
 4: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<PG::RecoveryState::GotLog>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::transition<PG::RecoveryState::IsIncomplete, PG::RecoveryState::Incomplete, boost::statechart::detail::no_context<PG::RecoveryState::IsIncomplete>, &(boost::statechart::detail::no_context<PG::RecoveryState::IsIncomplete>::no_function(PG::RecoveryState::IsIncomplete const&))> >, boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x104) [0x7f7a64]
 5: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x8d) [0x7f7bad]
 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x51) [0x7d5c21]
 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x2b) [0x7d5dfb]
 8: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x303) [0x78aa73]
 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x260) [0x672280]
 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6cada2]
 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xaed90e]
 12: (ThreadPool::WorkThread::entry()+0x10) [0xaf0770]
 13: (()+0x7e9a) [0x7efeb51a5e9a]
 14: (clone()+0x6d) [0x7efeb394e3fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

full log at teuthology:/home/ubuntu/joao/joao-2015-06-25_17:58:/ceph-osd.3.log.gz


Related issues

Duplicates RADOS - Bug #12687: osd thrashing + pg import/export can cause maybe_went_rw intervals to be missed New

History

#1 Updated by Samuel Just over 7 years ago

osd.5 had a history.les of 810 on 1.5b which it got from a ceph_objectstore_tool tranfer from osd.3. Peering didn't find anything with that les since the tool removed the copy from osd.3, which had been the only osd active at that time. Bottom line: objectstore tool here violated the peering rules and that unsurprisingly caused a faulty peering process. Will look more on monday.

#2 Updated by Samuel Just over 7 years ago

  • Priority changed from Urgent to High

#3 Updated by Samuel Just over 7 years ago

  • Status changed from New to Duplicate

Also available in: Atom PDF