Actions
Bug #12179
closedosd: PG.cc: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
--- begin dump of recent events --- -399> 2015-06-25 19:59:10.323481 7efe9bc4c700 -1 osd/PG.cc: In function 'void PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)' thread 7efe9bc4c700 time 2015-06-25 19:59:10.294765 osd/PG.cc: 292: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started) ceph version 9.0.1-1156-gb9c72eb (b9c72eb699cce494bb786646f5183c25fcf57545) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xafea2f] 2: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)+0x4fa) [0x79ffda] 3: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0x2ad) [0x7c23bd] 4: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<PG::RecoveryState::GotLog>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::transition<PG::RecoveryState::IsIncomplete, PG::RecoveryState::Incomplete, boost::statechart::detail::no_context<PG::RecoveryState::IsIncomplete>, &(boost::statechart::detail::no_context<PG::RecoveryState::IsIncomplete>::no_function(PG::RecoveryState::IsIncomplete const&))> >, boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x104) [0x7f7a64] 5: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x8d) [0x7f7bad] 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x51) [0x7d5c21] 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x2b) [0x7d5dfb] 8: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x303) [0x78aa73] 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x260) [0x672280] 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6cada2] 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xaed90e] 12: (ThreadPool::WorkThread::entry()+0x10) [0xaf0770] 13: (()+0x7e9a) [0x7efeb51a5e9a] 14: (clone()+0x6d) [0x7efeb394e3fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
full log at teuthology:/home/ubuntu/joao/joao-2015-06-25_17:58:29-rados-wip-10507-2---basic-multi/949109/ubuntu@plana83.front.sepia.ceph.com/ceph-osd.3.log.gz
Updated by Samuel Just almost 9 years ago
osd.5 had a history.les of 810 on 1.5b which it got from a ceph_objectstore_tool tranfer from osd.3. Peering didn't find anything with that les since the tool removed the copy from osd.3, which had been the only osd active at that time. Bottom line: objectstore tool here violated the peering rules and that unsurprisingly caused a faulty peering process. Will look more on monday.
Actions