Bug #12179
osd: PG.cc: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
--- begin dump of recent events --- -399> 2015-06-25 19:59:10.323481 7efe9bc4c700 -1 osd/PG.cc: In function 'void PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)' thread 7efe9bc4c700 time 2015-06-25 19:59:10.294765 osd/PG.cc: 292: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started) ceph version 9.0.1-1156-gb9c72eb (b9c72eb699cce494bb786646f5183c25fcf57545) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xafea2f] 2: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_t&, pg_shard_t)+0x4fa) [0x79ffda] 3: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0x2ad) [0x7c23bd] 4: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<PG::RecoveryState::GotLog>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::transition<PG::RecoveryState::IsIncomplete, PG::RecoveryState::Incomplete, boost::statechart::detail::no_context<PG::RecoveryState::IsIncomplete>, &(boost::statechart::detail::no_context<PG::RecoveryState::IsIncomplete>::no_function(PG::RecoveryState::IsIncomplete const&))> >, boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x104) [0x7f7a64] 5: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x8d) [0x7f7bad] 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0x51) [0x7d5c21] 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x2b) [0x7d5dfb] 8: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x303) [0x78aa73] 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x260) [0x672280] 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6cada2] 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xaed90e] 12: (ThreadPool::WorkThread::entry()+0x10) [0xaf0770] 13: (()+0x7e9a) [0x7efeb51a5e9a] 14: (clone()+0x6d) [0x7efeb394e3fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
full log at teuthology:/home/ubuntu/joao/joao-2015-06-25_17:58:29-rados-wip-10507-2---basic-multi/949109/ubuntu@plana83.front.sepia.ceph.com/ceph-osd.3.log.gz
Related issues
History
#1 Updated by Samuel Just over 7 years ago
osd.5 had a history.les of 810 on 1.5b which it got from a ceph_objectstore_tool tranfer from osd.3. Peering didn't find anything with that les since the tool removed the copy from osd.3, which had been the only osd active at that time. Bottom line: objectstore tool here violated the peering rules and that unsurprisingly caused a faulty peering process. Will look more on monday.
#2 Updated by Samuel Just over 7 years ago
- Priority changed from Urgent to High
#3 Updated by Samuel Just over 7 years ago
- Status changed from New to Duplicate