Project

General

Profile

Actions

Bug #44022

closed

mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash

Added by Neha Ojha about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):

de9b47e4c55bb576c605c46fa90d9fda24f77c381514888ba063e0fd01b350ec

Crash signature (v2):

Description

The crash happens on a mimic OSD. Telemetry crash reports have been reporting similar crashes in 14.2.4(may or may not be related).

2020-02-06T03:33:51.194 INFO:tasks.ceph.osd.11.smithi158.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.8-57-g530fb22/rpm/el7/BUILD/ceph-13.2.8-57-g530fb22/src/osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7f60aafd9700 time 2020-02-06 03:33:51.135848
2020-02-06T03:33:51.195 INFO:tasks.ceph.osd.11.smithi158.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.8-57-g530fb22/rpm/el7/BUILD/ceph-13.2.8-57-g530fb22/src/osd/PG.cc: 6665: FAILED assert(0 == "we got a bad state machine event")
2020-02-06T03:33:51.201 INFO:tasks.ceph.osd.11.smithi158.stderr: ceph version 13.2.8-57-g530fb22 (530fb2279cf8639e8ecff9ab4891acb72dabbf09) mimic (stable)
2020-02-06T03:33:51.201 INFO:tasks.ceph.osd.11.smithi158.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x7f60cdd6ba6b]                                                  2020-02-06T03:33:51.201 INFO:tasks.ceph.osd.11.smithi158.stderr: 2: (()+0x26fbf7) [0x7f60cdd6bbf7]
2020-02-06T03:33:51.201 INFO:tasks.ceph.osd.11.smithi158.stderr: 3: (PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0xa5) [0x55fe690ad6b5]
2020-02-06T03:33:51.201 INFO:tasks.ceph.osd.11.smithi158.stderr: 4: (()+0x489066) [0x55fe690f0066]
2020-02-06T03:33:51.201 INFO:tasks.ceph.osd.11.smithi158.stderr: 5: (boost::statechart::simple_state<PG::RecoveryState::Primary, PG::RecoveryState::Started, PG::RecoveryState::Peering, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x343) [0x55fe6912d1f3]
2020-02-06T03:33:51.202 INFO:tasks.ceph.osd.11.smithi158.stderr: 6: (boost::statechart::simple_state<PG::RecoveryState::Peering, PG::RecoveryState::Primary, PG::RecoveryState::GetInfo, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x8c) [0x55fe6912736c]
2020-02-06T03:33:51.202 INFO:tasks.ceph.osd.11.smithi158.stderr: 7: (boost::statechart::simple_state<PG::RecoveryState::GetInfo, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x9a) [0x55fe6912b0ca]
2020-02-06T03:33:51.202 INFO:tasks.ceph.osd.11.smithi158.stderr: 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x55fe69109ceb]
2020-02-06T03:33:51.202 INFO:tasks.ceph.osd.11.smithi158.stderr: 9: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x143) [0x55fe690ef9f3]
2020-02-06T03:33:51.202 INFO:tasks.ceph.osd.11.smithi158.stderr: 10: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xcf) [0x55fe6901e7df]
2020-02-06T03:33:51.202 INFO:tasks.ceph.osd.11.smithi158.stderr: 11: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x55fe692959f0]
2020-02-06T03:33:51.202 INFO:tasks.ceph.osd.11.smithi158.stderr: 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x592) [0x55fe69033802]
2020-02-06T03:33:51.203 INFO:tasks.ceph.osd.11.smithi158.stderr: 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3d3) [0x7f60cdd717a3]
2020-02-06T03:33:51.203 INFO:tasks.ceph.osd.11.smithi158.stderr: 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f60cdd72390]
2020-02-06T03:33:51.203 INFO:tasks.ceph.osd.11.smithi158.stderr: 15: (()+0x7e65) [0x7f60cace2e65]
2020-02-06T03:33:51.203 INFO:tasks.ceph.osd.11.smithi158.stderr: 16: (clone()+0x6d) [0x7f60c9dd288d]

/a/nojha-2020-02-06_01:27:32-upgrade:mimic-x:stress-split-nautilus-distro-basic-smithi/4736605/

upgrade:mimic-x:stress-split/{0-cluster/{openstack.yaml start.yaml} 1-ceph-install/mimic.yaml 1.1-pg-log-overrides/short_pg_log.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-workload/{radosbench.yaml rbd-cls.yaml rbd-import-export.yaml rbd_api.yaml readwrite.yaml rgw_ragweed_prepare.yaml snaps-few-objects.yaml} 5-finish-upgrade.yaml 6-msgr2.yaml 6-nautilus.yaml 7-final-workload/{rbd-python.yaml rgw-swift-ragweed_check.yaml snaps-many-objects.yaml} debug_upgrade.yaml objectstore/bluestore-bitmap.yaml supported-all-distro/centos_latest.yaml thrashosds-health.yaml}

Actions

Also available in: Atom PDF