Bug #12387
osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so))
Status:
Resolved
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
ubuntu@teuthology:/a/dzafman-2015-07-16_21:07:50-rados-wip-12000-12200---basic-multi/976605
2015-07-17 04:33:42.333590 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1206/1208 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] handle_peering_event: epoch_sent: 1260 epoch_requested: 1260 MNotifyRec from 4 notify (query_epoch:1260, epoch_sent:1260, info:0.1fc( empty local-les=1252 n=0 ec=1 les/c 1252/1252 1254/1259/1259)) features: 0x7ffffffffffff 2015-07-17 04:33:42.333605 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1206/1208 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] got osd.4 0.1fc( empty local-les=1252 n=0 ec=1 les/c 1252/1252 1254/1259/1259) 2015-07-17 04:33:42.333625 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] update_heartbeat_peers 1,3,4 unchanged 2015-07-17 04:33:42.333638 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: last_epoch_started moved forward, rebuilding prior 2015-07-17 04:33:42.333650 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior interval(1259-1246 up [1,4](1) acting [1,4](1)) 2015-07-17 04:33:42.333663 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] PriorSet: build_prior final: probe 3,4 down blocked_by {} 2015-07-17 04:33:42.333676 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] up_thru 1264 >= same_since 1259, all is well 2015-07-17 04:33:42.333687 7f49063ac700 20 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: dropping osd.5 from info_requested, no longer in probe set 2015-07-17 04:33:42.333700 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: have osd.4 info 0.1fc( empty local-les=1252 n=0 ec=1 les/c 1252/1252 1254/1259/1259) 2015-07-17 04:33:42.333715 7f49063ac700 15 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] publish_stats_to_osd 1268:3 2015-07-17 04:33:42.333725 7f49063ac700 20 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: Adding osd: 4 peer features: 7ffffffffffff 2015-07-17 04:33:42.333738 7f49063ac700 20 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: Adding osd: 4 acting features: 7ffffffffffff 2015-07-17 04:33:42.333748 7f49063ac700 20 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: Adding osd: 4 upacting features: 7ffffffffffff 2015-07-17 04:33:42.333759 7f49063ac700 10 osd.3 pg_epoch: 1268 pg[0.1fc( empty local-les=0 n=0 ec=1 les/c 1252/1252 1254/1259/1259) [3,4] r=0 lpr=1260 pi=1208-1246/8 crt=0'0 mlcod 0'0 peering] state<Started/Primary/Peering/GetInfo>: last maybe_went_rw interval was interval(1254-1258 up [3,4](3) acting [4,5](4) maybe_went_rw) 2015-07-17 04:33:42.370086 7f49063ac700 -1 osd/PG.cc: In function 'boost::statechart::result PG::RecoveryState::GetInfo::react(const PG::MNotifyRec&)' thread 7f49063ac700 time 2015-07-17 04:33:42.333786 osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so)) ceph version 9.0.1-1469-g45d41ec (45d41ec68981f795e032b2cdb4174cf0553f49b3) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xafee5f] 2: (PG::RecoveryState::GetInfo::react(PG::MNotifyRec const&)+0x10a7) [0x7afb87] 3: (boost::statechart::simple_state<PG::RecoveryState::GetInfo, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x166) [0x7ef956] 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7d243b] 5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x1e) [0x7d273e] 6: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x303) [0x7883b3] 7: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x260) [0x6730f0] 8: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6cb4c2] 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xaedd3e] 10: (ThreadPool::WorkThread::entry()+0x10) [0xaf0ba0] 11: (()+0x7e9a) [0x7f491f210e9a] 12: (clone()+0x6d) [0x7f491d9d28bd]
Related issues
Associated revisions
osd: When generating past intervals due to an import end at pg epoch
Add assert() to make sure same_interval_since isn't too far forward
Fixes: #12387
Signed-off-by: David Zafman <dzafman@redhat.com>
osd: When generating past intervals due to an import end at pg epoch
Add assert() to make sure same_interval_since isn't too far forward
Fixes: #12387
Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 65dcc2da76750d0b6dd2cf0031c44f32749f33e5)
History
#1 Updated by David Zafman over 8 years ago
- Status changed from New to 7
#2 Updated by Sage Weil over 8 years ago
- Status changed from 7 to Pending Backport
- Backport set to hammer
#3 Updated by David Zafman over 8 years ago
65dcc2da76750d0b6dd2cf0031c44f32749f33e5
#4 Updated by Loïc Dachary over 8 years ago
#5 Updated by Loïc Dachary over 7 years ago
- Status changed from Pending Backport to Resolved
#6 Updated by Yuri Weinstein over 7 years ago
- Related to Bug #17176: "FAILED assert(pg->peer_info.count(so))" in upgrade:infernalis-x-master-distro-basic-vps added