Project

General

Profile

Bug #12387

osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so))

Added by David Zafman over 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/a/dzafman-2015-07-16_21:07:50-rados-wip-12000-12200---basic-multi/976605

2015-07-17 04:33:42.333590 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1206/1208 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] handle_peering_event: epoch_sent: 1260 epoch_requested: 1260 MNotifyRec from 4 notify (query_epoch:1260, epoch_sent:1260, info:0.1fc( empty local-les=1252 n=0 ec=1 les/c 1252/1252 1254/1259/1259)) features: 0x7ffffffffffff
2015-07-17 04:33:42.333605 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1206/1208 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering]  got osd.4 0.1fc( empty local-les=1252 n=0 ec=1 les/c 1252/1252 1254/1259/1259)
2015-07-17 04:33:42.333625 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] update_heartbeat_peers 1,3,4 unchanged
2015-07-17 04:33:42.333638 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>:  last_epoch_started moved forward, rebuilding prior
2015-07-17 04:33:42.333650 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering]  PriorSet: build_prior interval(1259-1246 up [1,4](1) acting [1,4](1))
2015-07-17 04:33:42.333663 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering]  PriorSet: build_prior final: probe 3,4 down  blocked_by {}
2015-07-17 04:33:42.333676 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] up_thru 1264 >= same_since 1259, all is well
2015-07-17 04:33:42.333687 7f49063ac700 20 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>:  dropping osd.5 from info_requested, no longer in probe set
2015-07-17 04:33:42.333700 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>:  have osd.4 info 0.1fc( empty local-les=1252 n=0 ec=1 les/c 1252/1252 1254/1259/1259)
2015-07-17 04:33:42.333715 7f49063ac700 15 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] publish_stats_to_osd 1268:3
2015-07-17 04:33:42.333725 7f49063ac700 20 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: Adding osd: 4 peer features: 7ffffffffffff
2015-07-17 04:33:42.333738 7f49063ac700 20 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: Adding osd: 4 acting features: 7ffffffffffff
2015-07-17 04:33:42.333748 7f49063ac700 20 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: Adding osd: 4 upacting features: 7ffffffffffff
2015-07-17 04:33:42.333759 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>:  last maybe_went_rw interval was interval(1254-1258 up [3,4](3) acting [4,5](4) maybe_went_rw)
2015-07-17 04:33:42.370086 7f49063ac700 -1 osd/PG.cc: In function 'boost::statechart::result PG::RecoveryState::GetInfo::react(const PG::MNotifyRec&)' thread 7f49063ac700 time 2015-07-17 04:33:42.333786
osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so))

 ceph version 9.0.1-1469-g45d41ec (45d41ec68981f795e032b2cdb4174cf0553f49b3)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xafee5f]
 2: (PG::RecoveryState::GetInfo::react(PG::MNotifyRec const&)+0x10a7) [0x7afb87]
 3: (boost::statechart::simple_state<PG::RecoveryState::GetInfo, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x166) [0x7ef956]
 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7d243b]
 5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x1e) [0x7d273e]
 6: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x303) [0x7883b3]
 7: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x260) [0x6730f0]
 8: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6cb4c2]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xaedd3e]
 10: (ThreadPool::WorkThread::entry()+0x10) [0xaf0ba0]
 11: (()+0x7e9a) [0x7f491f210e9a]
 12: (clone()+0x6d) [0x7f491d9d28bd]

Related issues

Related to Ceph - Bug #17176: "FAILED assert(pg->peer_info.count(so))" in upgrade:infernalis-x-master-distro-basic-vps Won't Fix 08/30/2016
Copied to Ceph - Backport #13039: hammer: osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so)) Resolved

Associated revisions

Revision 65dcc2da (diff)
Added by David Zafman over 8 years ago

osd: When generating past intervals due to an import end at pg epoch

Add assert() to make sure same_interval_since isn't too far forward

Fixes: #12387

Signed-off-by: David Zafman <>

Revision fce79027 (diff)
Added by David Zafman almost 8 years ago

osd: When generating past intervals due to an import end at pg epoch

Add assert() to make sure same_interval_since isn't too far forward

Fixes: #12387

Signed-off-by: David Zafman <>
(cherry picked from commit 65dcc2da76750d0b6dd2cf0031c44f32749f33e5)

History

#1 Updated by David Zafman over 8 years ago

  • Status changed from New to 7

#2 Updated by Sage Weil over 8 years ago

  • Status changed from 7 to Pending Backport
  • Backport set to hammer

#3 Updated by David Zafman over 8 years ago

65dcc2da76750d0b6dd2cf0031c44f32749f33e5

#5 Updated by Loïc Dachary over 7 years ago

  • Status changed from Pending Backport to Resolved

#6 Updated by Yuri Weinstein over 7 years ago

  • Related to Bug #17176: "FAILED assert(pg->peer_info.count(so))" in upgrade:infernalis-x-master-distro-basic-vps added

Also available in: Atom PDF