Bug #17176
closed"FAILED assert(pg->peer_info.count(so))" in upgrade:infernalis-x-master-distro-basic-vps
0%
Description
This is jewel point release 10.2.3
Run: http://pulpito.front.sepia.ceph.com/yuriw-2016-08-30_16:14:21-upgrade:infernalis-x-master-distro-basic-vps/
Job: 392488
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2016-08-30_16:14:21-upgrade:infernalis-x-master-distro-basic-vps/392488/teuthology.log
1170569350-2016-08-30T18:37:00.149 INFO:tasks.ceph.osd.5.vpm177.stderr:2016-08-30 18:07:23.71201170569350-2016-08-30T18:37:00.149 INFO:tasks.ceph.osd.5.vpm177.stderr:2016-08-30 18:07:23.712063 7fadf425d940 -1 osd.5 2367 log_to_monitors {default=true} 1170569495-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr:osd/PG.cc: In function 'boost::statechart::result PG::RecoveryState::GetInfo::react(const PG::MNotifyRec&)' thread 7fadd6e94700 time 2016-08-30 18:36:54.590540 1170569715-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr:osd/PG.cc: 7014: FAILED assert(pg->peer_info.count(so)) 1170569831-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: ceph version 9.2.1-29-gcfdea3e (cfdea3e0c83de071ae82d7bd95c93a8a43c11eac) 1170569966-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fadf3d5a33b] 1170570123-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 2: (PG::RecoveryState::GetInfo::react(PG::MNotifyRec const&)+0xe99) [0x7fadf390a679] 1170570269-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 3: (boost::statechart::simple_state<PG::RecoveryState::GetInfo, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x168) [0x7fadf394a648] 1170570763-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7fadf3934d0b] 1170571066-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 5: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7fadf38e200e] 1170571235-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 6: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x267) [0x7fadf37e5397] 1170571420-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 7: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x7fadf382b148] 1170571601-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0x7fadf3d4bdb6] 1170571734-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 9: (ThreadPool::WorkThread::entry()+0x10) [0x7fadf3d4cc80] 1170571854-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: 10: (()+0x8184) [0x7fadf2378184] 1170571948-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: 11: (clone()+0x6d) [0x7fadf06bd37d] 1170572045-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.63 7fadf425d940 -1 osd.5 2367 log_to_monitors {default=true} 1170569495-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr:osd/PG.cc: In function 'boost::statechart::result PG::RecoveryState::GetInfo::react(const PG::MNotifyRec&)' thread 7fadd6e94700 time 2016-08-30 18:36:54.590540 1170569715-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr:osd/PG.cc: 7014: FAILED assert(pg->peer_info.count(so)) 1170569831-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: ceph version 9.2.1-29-gcfdea3e (cfdea3e0c83de071ae82d7bd95c93a8a43c11eac) 1170569966-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fadf3d5a33b] 1170570123-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 2: (PG::RecoveryState::GetInfo::react(PG::MNotifyRec const&)+0xe99) [0x7fadf390a679] 1170570269-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 3: (boost::statechart::simple_state<PG::RecoveryState::GetInfo, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x168) [0x7fadf394a648] 1170570763-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7fadf3934d0b] 1170571066-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 5: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7fadf38e200e] 1170571235-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 6: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x267) [0x7fadf37e5397] 1170571420-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 7: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x7fadf382b148] 1170571601-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0x7fadf3d4bdb6] 1170571734-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 9: (ThreadPool::WorkThread::entry()+0x10) [0x7fadf3d4cc80] 1170571854-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: 10: (()+0x8184) [0x7fadf2378184] 1170571948-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: 11: (clone()+0x6d) [0x7fadf06bd37d] 1170572045-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Yuri Weinstein over 7 years ago
- Related to Bug #12387: osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so)) added
Updated by Yuri Weinstein over 7 years ago
- Related to Backport #13039: hammer: osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so)) added
Updated by Loïc Dachary over 7 years ago
- Status changed from New to In Progress
filter="upgrade:infernalis-x/stress-split-erasure-code/{0-cluster/{openstack.yaml start.yaml} 1-infernalis-install/infernalis.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/ec-rados-default.yaml 6-next-mon/monb.yaml 8-next-mon/monc.yaml 9-workload/ec-rados-plugin=jerasure-k=3-m=1.yaml distros/ubuntu_14.04.yaml}" teuthology-suite --dry-run --verbose --suite-branch jewel --ceph jewel --suite upgrade/infernalis-x --filter "$filter" --machine-type vps --email loic@dachary.org --priority 50
- fail http://pulpito.front.sepia.ceph.com:80/loic-2016-09-09_08:51:07-upgrade:infernalis-x-jewel---basic-vps/
- One out of ten failed
Updated by Loïc Dachary over 7 years ago
@Yuri you are correct, https://github.com/ceph/ceph/pull/5780/commits/65dcc2da76750d0b6dd2cf0031c44f32749f33e5 was backported to hammer but not to infernalis and it looks like the symptom are exactly the same. I'm running the test 10 times to verify it is a transient error.
Updated by Loïc Dachary over 7 years ago
@Yuri I believe this missing backport is not a blocker in the context of the 10.2.3 release validation.
Updated by Loïc Dachary over 7 years ago
- Status changed from In Progress to Won't Fix
Since this would require https://github.com/ceph/ceph/pull/5780/commits/65dcc2da76750d0b6dd2cf0031c44f32749f33e5 to be backported to infernalis although it is end-of-life, I'm closing this as won't fix. Feel free to re-open if you disagree.