Project

General

Profile

Actions

Bug #22902

closed

src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")

Added by David Zafman about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/dzafman-2018-02-01_09:46:36-rados-wip-zafman-testing-distro-basic-smithi/2138315

I think we are seeing:
Got RecoveryDone when in RepWaitBackfillReserved

2018-02-01 20:06:11.492 7effa3954700 20 osd.6 op_wq(6) _enqueue OpQueueItem(2.1e PGPeeringEvent(epoch_sent: 296 epoch_requested: 296 MInfoRec from 2 info: 2.1e( v 271'608 (150'442,271'608] local-lis/les=284/285 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279)) prio 255 cost 10 e296)
2018-02-01 20:06:12.396 7effa3954700 20 osd.6 op_wq(6) _enqueue OpQueueItem(2.1e PGPeeringEvent(epoch_sent: 296 epoch_requested: 296 RemoteReservationCanceled) prio 255 cost 10 e296)
2018-02-01 20:06:12.396 7effa3954700 20 osd.6 op_wq(6) _enqueue OpQueueItem(2.1e PGPeeringEvent(epoch_sent: 296 epoch_requested: 296 RequestBackfillPrio: priority 100) prio 255 cost 10 e296)
2018-02-01 20:06:12.396 7eff912f0700 20 osd.6 op_wq(6) _enqueue OpQueueItem(2.1e PGPeeringEvent(epoch_sent: 296 epoch_requested: 296 RecoveryDone) prio 255 cost 10 e296)

2018-02-01 20:06:12.396 7eff912f0700 5 osd.6 pg_epoch: 296 pg[2.1e( v 271'608 (150'442,271'608] local-lis/les=279/280 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279) [7,5,6]/[2,5,4] r=-1 lpr=284 pi=[260,284)/1 luod=0'0 crt=271'608 active+remapped ps=[c1~1,d4~1,d6~1,db~1,de~2,e1~1,ec~1]] exit Started/ReplicaActive/RepNotRecovering 0.000185 1 0.000143
2018-02-01 20:06:12.396 7eff912f0700 5 osd.6 pg_epoch: 296 pg[2.1e( v 271'608 (150'442,271'608] local-lis/les=279/280 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279) [7,5,6]/[2,5,4] r=-1 lpr=284 pi=[260,284)/1 luod=0'0 crt=271'608 active+remapped ps=[c1~1,d4~1,d6~1,db~1,de~2,e1~1,ec~1]] enter Started/ReplicaActive/RepWaitBackfillReserved
2018-02-01 20:06:12.396 7eff952f8700 5 osd.6 pg_epoch: 296 pg[2.1e( v 271'608 (150'442,271'608] local-lis/les=279/280 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279) [7,5,6]/[2,5,4] r=-1 lpr=284 pi=[260,284)/1 luod=0'0 crt=271'608 active+remapped ps=[c1~1,d4~1,d6~1,db~1,de~2,e1~1,ec~1]] exit Started/ReplicaActive 11.780731 0 0.000000
2018-02-01 20:06:12.396 7eff952f8700 5 osd.6 pg_epoch: 296 pg[2.1e( v 271'608 (150'442,271'608] local-lis/les=279/280 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279) [7,5,6]/[2,5,4] r=-1 lpr=284 pi=[260,284)/1 luod=0'0 crt=271'608 active+remapped ps=[c1~1,d4~1,d6~1,db~1,de~2,e1~1,ec~1]] exit Started 12.781961 0 0.000000
2018-02-01 20:06:12.396 7eff952f8700 5 osd.6 pg_epoch: 296 pg[2.1e( v 271'608 (150'442,271'608] local-lis/les=279/280 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279) [7,5,6]/[2,5,4] r=-1 lpr=284 pi=[260,284)/1 luod=0'0 crt=271'608 active+remapped ps=[c1~1,d4~1,d6~1,db~1,de~2,e1~1,ec~1]] enter Crashed
2018-02-01 20:06:12.400 7eff952f8700 -1 /build/ceph-13.0.1-1533-g2794479/src/osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7eff952f8700 time 2018-02-01 20:06:12.401128
/build/ceph-13.0.1-1533-g2794479/src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")

ceph version 13.0.1-1533-g2794479 (27944791b4f97f8a9677f3920230fd09312a23e9) mimic (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x7effbac57625]
2: (PG::RecoveryState::Crashed::Crashed(boost::statechart::state&lt;PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list&lt;mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na&gt;, (boost::statechart::history_mode)0>::my_context)+0xb9) [0x5638b0e905a9]
3: (()+0x454526) [0x5638b0ed0526]
4: (boost::statechart::simple_state&lt;PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, PG::RecoveryState::RepNotRecovering, (boost::statechart::history_mode)0&gt;::react_impl(boost::statechart::event_base const&, void const*)+0x26b) [0x5638b0f0f42b]
5: (boost::statechart::simple_state&lt;PG::RecoveryState::RepWaitBackfillReserved, PG::RecoveryState::ReplicaActive, boost::mpl::list&lt;mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na&gt;, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x99) [0x5638b0f096e9]
6: (boost::statechart::state_machine&lt;PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator&lt;void&gt;, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x5638b0ee320b]
7: (PG::do_peering_event(std::shared_ptr&lt;PGPeeringEvent&gt;, PG::RecoveryCtx*)+0x198) [0x5638b0ec99e8]
8: (OSD::dequeue_peering_evt(PG*, std::shared_ptr&lt;PGPeeringEvent&gt;, ThreadPool::TPHandle&)+0x10a) [0x5638b0e0cdba]
9: (PGPeeringItem::run(OSD*, boost::intrusive_ptr&lt;PG&gt;&, ThreadPool::TPHandle&)+0x4d) [0x5638b1072b3d]
10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xf2a) [0x5638b0e0167a]
11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4f2) [0x7effbac5d0f2]
12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7effbac5f450]
13: (()+0x76ba) [0x7effb972c6ba]
14: (clone()+0x6d) [0x7effb8f553dd]

Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #36678: luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")ResolvedDavid ZafmanActions
Actions

Also available in: Atom PDF