Project

General

Profile

Actions

Bug #17176

closed

"FAILED assert(pg->peer_info.count(so))" in upgrade:infernalis-x-master-distro-basic-vps

Added by Yuri Weinstein over 7 years ago. Updated over 7 years ago.

Status:
Won't Fix
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/infernalis-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is jewel point release 10.2.3
Run: http://pulpito.front.sepia.ceph.com/yuriw-2016-08-30_16:14:21-upgrade:infernalis-x-master-distro-basic-vps/
Job: 392488
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2016-08-30_16:14:21-upgrade:infernalis-x-master-distro-basic-vps/392488/teuthology.log

1170569350-2016-08-30T18:37:00.149 INFO:tasks.ceph.osd.5.vpm177.stderr:2016-08-30 18:07:23.71201170569350-2016-08-30T18:37:00.149 INFO:tasks.ceph.osd.5.vpm177.stderr:2016-08-30 18:07:23.712063 7fadf425d940 -1 osd.5 2367 log_to_monitors {default=true}
1170569495-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr:osd/PG.cc: In function 'boost::statechart::result PG::RecoveryState::GetInfo::react(const PG::MNotifyRec&)' thread 7fadd6e94700 time 2016-08-30 18:36:54.590540
1170569715-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr:osd/PG.cc: 7014: FAILED assert(pg->peer_info.count(so))
1170569831-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: ceph version 9.2.1-29-gcfdea3e (cfdea3e0c83de071ae82d7bd95c93a8a43c11eac)
1170569966-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fadf3d5a33b]
1170570123-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 2: (PG::RecoveryState::GetInfo::react(PG::MNotifyRec const&)+0xe99) [0x7fadf390a679]
1170570269-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 3: (boost::statechart::simple_state<PG::RecoveryState::GetInfo, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x168) [0x7fadf394a648]
1170570763-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7fadf3934d0b]
1170571066-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 5: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7fadf38e200e]
1170571235-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 6: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x267) [0x7fadf37e5397]
1170571420-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 7: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x7fadf382b148]
1170571601-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0x7fadf3d4bdb6]
1170571734-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 9: (ThreadPool::WorkThread::entry()+0x10) [0x7fadf3d4cc80]
1170571854-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: 10: (()+0x8184) [0x7fadf2378184]
1170571948-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: 11: (clone()+0x6d) [0x7fadf06bd37d]
1170572045-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.63 7fadf425d940 -1 osd.5 2367 log_to_monitors {default=true}
1170569495-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr:osd/PG.cc: In function 'boost::statechart::result PG::RecoveryState::GetInfo::react(const PG::MNotifyRec&)' thread 7fadd6e94700 time 2016-08-30 18:36:54.590540
1170569715-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr:osd/PG.cc: 7014: FAILED assert(pg->peer_info.count(so))
1170569831-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: ceph version 9.2.1-29-gcfdea3e (cfdea3e0c83de071ae82d7bd95c93a8a43c11eac)
1170569966-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fadf3d5a33b]
1170570123-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 2: (PG::RecoveryState::GetInfo::react(PG::MNotifyRec const&)+0xe99) [0x7fadf390a679]
1170570269-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 3: (boost::statechart::simple_state<PG::RecoveryState::GetInfo, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x168) [0x7fadf394a648]
1170570763-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7fadf3934d0b]
1170571066-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 5: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7fadf38e200e]
1170571235-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 6: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x267) [0x7fadf37e5397]
1170571420-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 7: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x7fadf382b148]
1170571601-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0x7fadf3d4bdb6]
1170571734-2016-08-30T18:37:00.150 INFO:tasks.ceph.osd.5.vpm177.stderr: 9: (ThreadPool::WorkThread::entry()+0x10) [0x7fadf3d4cc80]
1170571854-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: 10: (()+0x8184) [0x7fadf2378184]
1170571948-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: 11: (clone()+0x6d) [0x7fadf06bd37d]
1170572045-2016-08-30T18:37:00.151 INFO:tasks.ceph.osd.5.vpm177.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 2 (0 open2 closed)

Related to Ceph - Bug #12387: osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so))ResolvedDavid Zafman07/17/2015

Actions
Related to Ceph - Backport #13039: hammer: osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so))ResolvedDavid ZafmanActions
Actions #1

Updated by Yuri Weinstein over 7 years ago

See #12387
Missing in jewel?

Actions #2

Updated by Yuri Weinstein over 7 years ago

  • Related to Bug #12387: osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so)) added
Actions #3

Updated by Yuri Weinstein over 7 years ago

  • Related to Backport #13039: hammer: osd/PG.cc: 6879: FAILED assert(pg->peer_info.count(so)) added
Actions #4

Updated by Samuel Just over 7 years ago

  • Assignee set to Loïc Dachary
Actions #5

Updated by Loïc Dachary over 7 years ago

  • Status changed from New to In Progress
filter="upgrade:infernalis-x/stress-split-erasure-code/{0-cluster/{openstack.yaml start.yaml} 1-infernalis-install/infernalis.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/ec-rados-default.yaml 6-next-mon/monb.yaml 8-next-mon/monc.yaml 9-workload/ec-rados-plugin=jerasure-k=3-m=1.yaml distros/ubuntu_14.04.yaml}" 
teuthology-suite --dry-run --verbose --suite-branch jewel --ceph jewel --suite upgrade/infernalis-x --filter "$filter" --machine-type vps --email loic@dachary.org --priority 50
Actions #6

Updated by Loïc Dachary over 7 years ago

@Yuri you are correct, https://github.com/ceph/ceph/pull/5780/commits/65dcc2da76750d0b6dd2cf0031c44f32749f33e5 was backported to hammer but not to infernalis and it looks like the symptom are exactly the same. I'm running the test 10 times to verify it is a transient error.

Actions #7

Updated by Loïc Dachary over 7 years ago

@Yuri I believe this missing backport is not a blocker in the context of the 10.2.3 release validation.

Actions #8

Updated by Loïc Dachary over 7 years ago

  • Status changed from In Progress to Won't Fix

Since this would require https://github.com/ceph/ceph/pull/5780/commits/65dcc2da76750d0b6dd2cf0031c44f32749f33e5 to be backported to infernalis although it is end-of-life, I'm closing this as won't fix. Feel free to re-open if you disagree.

Actions

Also available in: Atom PDF