Project

General

Profile

Actions

Bug #20703

closed

osd/PG.cc: 5928: FAILED assert(0 == "we got a bad state machine event")

Added by Sage Weil over 6 years ago. Updated over 6 years ago.

Status:
Rejected
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-07-20T05:48:08.757 INFO:tasks.ceph.osd.0.smithi090.stderr:/build/ceph-12.1.1-195-gb80122d/src/osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7fca72868700 time 2017-07-20 05:48:08.756089
2017-07-20T05:48:08.757 INFO:tasks.ceph.osd.0.smithi090.stderr:/build/ceph-12.1.1-195-gb80122d/src/osd/PG.cc: 5928: FAILED assert(0 == "we got a bad state machine event")
2017-07-20T05:48:08.758 INFO:tasks.ceph.osd.0.smithi090.stderr: ceph version 12.1.1-195-gb80122d (b80122de4918e6fc0df376627bde328f84d50be1) luminous (rc)
2017-07-20T05:48:08.759 INFO:tasks.ceph.osd.0.smithi090.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x557e00a31c82]
2017-07-20T05:48:08.759 INFO:tasks.ceph.osd.0.smithi090.stderr: 2: (PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::
2017-07-20T05:48:08.759 INFO:tasks.ceph.osd.0.smithi090.stderr: 3: (()+0x5771c6) [0x557e005321c6]
2017-07-20T05:48:08.759 INFO:tasks.ceph.osd.0.smithi090.stderr: 4: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_:
2017-07-20T05:48:08.759 INFO:tasks.ceph.osd.0.smithi090.stderr: 5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x69) [0x557e00545ba9]
2017-07-20T05:48:08.759 INFO:tasks.ceph.osd.0.smithi090.stderr: 6: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x38d) [0x557e0050f4ad]
2017-07-20T05:48:08.759 INFO:tasks.ceph.osd.0.smithi090.stderr: 7: (OSD::process_peering_events(std::__cxx11::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x29e) [0x557e0045b64e]
2017-07-20T05:48:08.760 INFO:tasks.ceph.osd.0.smithi090.stderr: 8: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x27) [0x557e004c5ca7]
2017-07-20T05:48:08.760 INFO:tasks.ceph.osd.0.smithi090.stderr: 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xf2d) [0x557e00a38d7d]
2017-07-20T05:48:08.763 INFO:tasks.ceph.osd.0.smithi090.stderr: 10: (ThreadPool::WorkThread::entry()+0x10) [0x557e00a39ee0]
2017-07-20T05:48:08.763 INFO:tasks.ceph.osd.0.smithi090.stderr: 11: (()+0x76ba) [0x7fca8cb216ba]
2017-07-20T05:48:08.763 INFO:tasks.ceph.osd.0.smithi090.stderr: 12: (clone()+0x6d) [0x7fca8bb983dd]
2017-07-20T05:48:08.763 INFO:tasks.ceph.osd.0.smithi090.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2017-07-20T05:48:08.763 INFO:tasks.ceph.osd.0.smithi090.stderr:2017-07-20 05:48:08.758471 7fca72868700 -1 /build/ceph-12.1.1-195-gb80122d/src/osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7fca

on upgrade test,

/a/sage-2017-07-20_05:37:32-rados-wip-sage-testing2-distro-basic-smithi/1423990

Actions #1

Updated by Sage Weil over 6 years ago

  • Priority changed from High to Urgent
2017-07-20T21:31:13.072 INFO:tasks.ceph.osd.2.smithi055.stderr:/build/ceph-12.1.1-237-gef10e30/src/osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7f3e170b1700 time 2017-07-20 21:31:13.072507
2017-07-20T21:31:13.072 INFO:tasks.ceph.osd.2.smithi055.stderr:/build/ceph-12.1.1-237-gef10e30/src/osd/PG.cc: 5928: FAILED assert(0 == "we got a bad state machine event")
2017-07-20T21:31:13.080 INFO:tasks.ceph.osd.2.smithi055.stderr: ceph version 12.1.1-237-gef10e30 (ef10e30cabed8ec78a2a552535a5add236060344) luminous (rc)
2017-07-20T21:31:13.080 INFO:tasks.ceph.osd.2.smithi055.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x7f3e33f955ce]
2017-07-20T21:31:13.080 INFO:tasks.ceph.osd.2.smithi055.stderr: 2: (PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::
na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0x135) [0x7f3e33adc6d5]
2017-07-20T21:31:13.080 INFO:tasks.ceph.osd.2.smithi055.stderr: 3: (()+0x5a6306) [0x7f3e33b1a306]
2017-07-20T21:31:13.080 INFO:tasks.ceph.osd.2.smithi055.stderr: 4: (boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x109) [0x7f3e33b57a49]
2017-07-20T21:31:13.080 INFO:tasks.ceph.osd.2.smithi055.stderr: 5: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_:
:na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x1a9) [0x7f3e33b54b79]
2017-07-20T21:31:13.081 INFO:tasks.ceph.osd.2.smithi055.stderr: 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7f3e33b3571b]
2017-07-20T21:31:13.081 INFO:tasks.ceph.osd.2.smithi055.stderr: 7: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7f3e33b029ee]
2017-07-20T21:31:13.081 INFO:tasks.ceph.osd.2.smithi055.stderr: 8: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x212) [0x7f3e33a58d52]
2017-07-20T21:31:13.081 INFO:tasks.ceph.osd.2.smithi055.stderr: 9: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x17) [0x7f3e33ab49b7]
2017-07-20T21:31:13.081 INFO:tasks.ceph.osd.2.smithi055.stderr: 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb65) [0x7f3e33f9bf95]
2017-07-20T21:31:13.081 INFO:tasks.ceph.osd.2.smithi055.stderr: 11: (ThreadPool::WorkThread::entry()+0x10) [0x7f3e33f9cf60]

/a/yuriw-2017-07-20_19:48:38-rados-wip-yuri-testing3_2017_7_21-distro-basic-smithi/1425395
rados/upgrade/jewel-x-singleton/

Note this run as https://github.com/ceph/ceph/pull/16009, a suspicious change to async messenger.

Actions #2

Updated by Sage Weil over 6 years ago

/a/yuriw-2017-07-20_19:48:38-rados-wip-yuri-testing3_2017_7_21-distro-basic-smithi/1425414
rados/objectstore/ceph_objectstore_tool.yaml

(note this test run has a suspicious async message pr, see https://github.com/ceph/ceph/pull/16009)

Actions #3

Updated by Sage Weil over 6 years ago

  • Status changed from 12 to Rejected

pretty sure this was that asyncmsgr patch's fault.

Actions

Also available in: Atom PDF