Project

General

Profile

Actions

Bug #22902

closed

src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")

Added by David Zafman about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/dzafman-2018-02-01_09:46:36-rados-wip-zafman-testing-distro-basic-smithi/2138315

I think we are seeing:
Got RecoveryDone when in RepWaitBackfillReserved

2018-02-01 20:06:11.492 7effa3954700 20 osd.6 op_wq(6) _enqueue OpQueueItem(2.1e PGPeeringEvent(epoch_sent: 296 epoch_requested: 296 MInfoRec from 2 info: 2.1e( v 271'608 (150'442,271'608] local-lis/les=284/285 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279)) prio 255 cost 10 e296)
2018-02-01 20:06:12.396 7effa3954700 20 osd.6 op_wq(6) _enqueue OpQueueItem(2.1e PGPeeringEvent(epoch_sent: 296 epoch_requested: 296 RemoteReservationCanceled) prio 255 cost 10 e296)
2018-02-01 20:06:12.396 7effa3954700 20 osd.6 op_wq(6) _enqueue OpQueueItem(2.1e PGPeeringEvent(epoch_sent: 296 epoch_requested: 296 RequestBackfillPrio: priority 100) prio 255 cost 10 e296)
2018-02-01 20:06:12.396 7eff912f0700 20 osd.6 op_wq(6) _enqueue OpQueueItem(2.1e PGPeeringEvent(epoch_sent: 296 epoch_requested: 296 RecoveryDone) prio 255 cost 10 e296)

2018-02-01 20:06:12.396 7eff912f0700 5 osd.6 pg_epoch: 296 pg[2.1e( v 271'608 (150'442,271'608] local-lis/les=279/280 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279) [7,5,6]/[2,5,4] r=-1 lpr=284 pi=[260,284)/1 luod=0'0 crt=271'608 active+remapped ps=[c1~1,d4~1,d6~1,db~1,de~2,e1~1,ec~1]] exit Started/ReplicaActive/RepNotRecovering 0.000185 1 0.000143
2018-02-01 20:06:12.396 7eff912f0700 5 osd.6 pg_epoch: 296 pg[2.1e( v 271'608 (150'442,271'608] local-lis/les=279/280 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279) [7,5,6]/[2,5,4] r=-1 lpr=284 pi=[260,284)/1 luod=0'0 crt=271'608 active+remapped ps=[c1~1,d4~1,d6~1,db~1,de~2,e1~1,ec~1]] enter Started/ReplicaActive/RepWaitBackfillReserved
2018-02-01 20:06:12.396 7eff952f8700 5 osd.6 pg_epoch: 296 pg[2.1e( v 271'608 (150'442,271'608] local-lis/les=279/280 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279) [7,5,6]/[2,5,4] r=-1 lpr=284 pi=[260,284)/1 luod=0'0 crt=271'608 active+remapped ps=[c1~1,d4~1,d6~1,db~1,de~2,e1~1,ec~1]] exit Started/ReplicaActive 11.780731 0 0.000000
2018-02-01 20:06:12.396 7eff952f8700 5 osd.6 pg_epoch: 296 pg[2.1e( v 271'608 (150'442,271'608] local-lis/les=279/280 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279) [7,5,6]/[2,5,4] r=-1 lpr=284 pi=[260,284)/1 luod=0'0 crt=271'608 active+remapped ps=[c1~1,d4~1,d6~1,db~1,de~2,e1~1,ec~1]] exit Started 12.781961 0 0.000000
2018-02-01 20:06:12.396 7eff952f8700 5 osd.6 pg_epoch: 296 pg[2.1e( v 271'608 (150'442,271'608] local-lis/les=279/280 n=14 ec=185/17 lis/c 284/260 les/c/f 285/261/0 278/284/279) [7,5,6]/[2,5,4] r=-1 lpr=284 pi=[260,284)/1 luod=0'0 crt=271'608 active+remapped ps=[c1~1,d4~1,d6~1,db~1,de~2,e1~1,ec~1]] enter Crashed
2018-02-01 20:06:12.400 7eff952f8700 -1 /build/ceph-13.0.1-1533-g2794479/src/osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7eff952f8700 time 2018-02-01 20:06:12.401128
/build/ceph-13.0.1-1533-g2794479/src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")

ceph version 13.0.1-1533-g2794479 (27944791b4f97f8a9677f3920230fd09312a23e9) mimic (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x7effbac57625]
2: (PG::RecoveryState::Crashed::Crashed(boost::statechart::state&lt;PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list&lt;mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na&gt;, (boost::statechart::history_mode)0>::my_context)+0xb9) [0x5638b0e905a9]
3: (()+0x454526) [0x5638b0ed0526]
4: (boost::statechart::simple_state&lt;PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, PG::RecoveryState::RepNotRecovering, (boost::statechart::history_mode)0&gt;::react_impl(boost::statechart::event_base const&, void const*)+0x26b) [0x5638b0f0f42b]
5: (boost::statechart::simple_state&lt;PG::RecoveryState::RepWaitBackfillReserved, PG::RecoveryState::ReplicaActive, boost::mpl::list&lt;mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na&gt;, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x99) [0x5638b0f096e9]
6: (boost::statechart::state_machine&lt;PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator&lt;void&gt;, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x5638b0ee320b]
7: (PG::do_peering_event(std::shared_ptr&lt;PGPeeringEvent&gt;, PG::RecoveryCtx*)+0x198) [0x5638b0ec99e8]
8: (OSD::dequeue_peering_evt(PG*, std::shared_ptr&lt;PGPeeringEvent&gt;, ThreadPool::TPHandle&)+0x10a) [0x5638b0e0cdba]
9: (PGPeeringItem::run(OSD*, boost::intrusive_ptr&lt;PG&gt;&, ThreadPool::TPHandle&)+0x4d) [0x5638b1072b3d]
10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xf2a) [0x5638b0e0167a]
11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4f2) [0x7effbac5d0f2]
12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7effbac5f450]
13: (()+0x76ba) [0x7effb972c6ba]
14: (clone()+0x6d) [0x7effb8f553dd]

Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #36678: luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")ResolvedDavid ZafmanActions
Actions #3

Updated by Josh Durgin about 6 years ago

  • Priority changed from Normal to Urgent
Actions #4

Updated by David Zafman about 6 years ago

OSD 4 is the primary [6,5,4]/[4,5,7] with osd.6 crashing

2018-03-09 02:04:12.718 7fc2a8f8f700 10 osd.4 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfilling trimq=[53~3,57~1,5c~1,5f~1]] final new_last_backfill at MAX
2018-03-09 02:04:12.718 7fc2a8f8f700 10 osd.4 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfilling trimq=[53~3,57~1,5c~1,5f~1]] start_recovery_op MAX
2018-03-09 02:04:12.718 7fc2a8f8f700 10 osd.4 131 start_recovery_op pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 rops=1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfilling trimq=[53~3,57~1,5c~1,5f~1]] MAX (0/3 rops)
2018-03-09 02:04:12.718 7fc2a8f8f700 20 osd.4 131 share_map_peer 0x55889950ea20 already has epoch 131
2018-03-09 02:04:12.718 7fc2a8f8f700  1 -- 172.21.15.48:6802/13181 --> 172.21.15.48:6806/13182 -- pg_backfill(finish 2.1c e 131/131 lb MAX) v3 -- ?+0 0x55889d0fd080 con 0x55889950ea20
2018-03-09 02:04:12.718 7fc2a8f8f700 10 osd.4 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 rops=1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfilling trimq=[53~3,57~1,5c~1,5f~1]]  peer 6 num_objects now 37 / 37
2018-03-09 02:04:12.718 7fc2a8f8f700 10 osd.4 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 rops=1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfilling trimq=[53~3,57~1,5c~1,5f~1]]  started 0
2018-03-09 02:04:12.718 7fc2a8f8f700 10 osd.4 131 do_recovery started 0/1 on pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 rops=1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfilling trimq=[53~3,57~1,5c~1,5f~1]]
2018-03-09 02:04:12.718 7fc2a8f8f700 20 osd.4 op_wq(4) _process OpQueueItem(2.1c PGPeeringEvent(epoch_sent: 131 epoch_requested: 131 RemoteReservationRevoked) prio 255 cost 10 e131) queued
2018-03-09 02:04:12.718 7fc2a8f8f700 20 osd.4 op_wq(4) _process OpQueueItem(2.1c PGPeeringEvent(epoch_sent: 131 epoch_requested: 131 RemoteReservationRevoked) prio 255 cost 10 e131) pg 0x55889cb62000
2018-03-09 02:04:12.718 7fc2a8f8f700 10 osd.4 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 rops=1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfilling trimq=[53~3,57~1,5c~1,5f~1]] do_peering_event: epoch_sent: 131 epoch_requested: 131 RemoteReservationRevoked
2018-03-09 02:04:12.718 7fc2a8f8f700  1 -- 172.21.15.48:6802/13181 --> 172.21.15.48:6806/13182 -- MBackfillReserve RELEASE  pgid: 2.1c, query_epoch: 131 v4 -- ?+0 0x558899324e00 con 0x55889950ea20
2018-03-09 02:04:12.718 7fc2a8f8f700  5 osd.4 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 rops=1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfill_wait+backfilling trimq=[53~3,57~1,5c~1,5f~1]] exit Started/Primary/Active/Backfilling 0.000396 1 0.000154
2018-03-09 02:04:12.718 7fc2a8f8f700  5 osd.4 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 rops=1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfill_wait trimq=[53~3,57~1,5c~1,5f~1]] enter Started/Primary/Active/WaitLocalBackfillReserved
2018-03-09 02:04:12.718 7fc2a8f8f700 10 log is not dirty
2018-03-09 02:04:12.718 7fc2a8f8f700 20 osd.4 op_wq(4) _process empty q, waiting
2018-03-09 02:04:12.718 7fc2a1780700 15 osd.4 131 enqueue_peering_evt 2.1c epoch_sent: 131 epoch_requested: 131 LocalBackfillReserved
2018-03-09 02:04:12.718 7fc2a1780700 20 osd.4 op_wq(4) _enqueue OpQueueItem(2.1c PGPeeringEvent(epoch_sent: 131 epoch_requested: 131 LocalBackfillReserved) prio 255 cost 10 e131)
2018-03-09 02:04:12.718 7fc2a4f87700 20 osd.4 op_wq(4) _process OpQueueItem(2.1c PGPeeringEvent(epoch_sent: 131 epoch_requested: 131 LocalBackfillReserved) prio 255 cost 10 e131) queued
2018-03-09 02:04:12.718 7fc2a4f87700 20 osd.4 op_wq(4) _process OpQueueItem(2.1c PGPeeringEvent(epoch_sent: 131 epoch_requested: 131 LocalBackfillReserved) prio 255 cost 10 e131) pg 0x55889cb62000
2018-03-09 02:04:12.718 7fc2a4f87700 10 osd.4 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 rops=1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfill_wait trimq=[53~3,57~1,5c~1,5f~1]] do_peering_event: epoch_sent: 131 epoch_requested: 131 LocalBackfillReserved
2018-03-09 02:04:12.718 7fc2a4f87700  5 osd.4 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 rops=1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfill_wait trimq=[53~3,57~1,5c~1,5f~1]] exit Started/Primary/Active/WaitLocalBackfillReserved 0.000162 1 0.000110
2018-03-09 02:04:12.718 7fc2a4f87700  5 osd.4 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=0 lpr=130 pi=[118,130)/1 rops=1 bft=6 crt=126'439 lcod 123'437 mlcod 0'0 active+remapped+backfill_wait trimq=[53~3,57~1,5c~1,5f~1]] enter Started/Primary/Active/WaitRemoteBackfillReserved
2018-03-09 02:04:12.718 7fc2a4f87700  1 -- 172.21.15.48:6802/13181 --> 172.21.15.48:6806/13182 -- MBackfillReserve REQUEST  pgid: 2.1c, query_epoch: 131, prio: 100 v4 -- ?+0 0x558899de3a00 con 0x55889950ea20

The osd.6 replica is looping between RepNotRecovering -> RepWaitBackfillReserved -> RepRecovering

2018-03-09 02:04:12.050 7fbc5885f700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 125/118 les/c/f 126/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 crt=126'439 remapped] state<Started/ReplicaActive>: Activate Finished
2018-03-09 02:04:12.066 7fbc54857700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 125/118 les/c/f 126/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 MInfoRec from 4 info: 2.1c( v 126'439 (26'100,126'439] local-lis/les=130/131 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125)
2018-03-09 02:04:12.070 7fbc5885f700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RequestBackfillPrio: priority 100
2018-03-09 02:04:12.070 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepNotRecovering 0.018110 2 0.000097
2018-03-09 02:04:12.070 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Started/ReplicaActive/RepWaitBackfillReserved
2018-03-09 02:04:12.070 7fbc54857700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RemoteBackfillReserved
2018-03-09 02:04:12.070 7fbc54857700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepWaitBackfillReserved 0.000136 1 0.000070
2018-03-09 02:04:12.070 7fbc54857700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Started/ReplicaActive/RepRecovering
2018-03-09 02:04:12.098 7fbc5885f700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RemoteBackfillPreempted
2018-03-09 02:04:12.098 7fbc54857700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RemoteReservationCanceled
2018-03-09 02:04:12.098 7fbc54857700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepRecovering 0.028659 2 0.000067
2018-03-09 02:04:12.098 7fbc54857700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Started/ReplicaActive/RepNotRecovering
2018-03-09 02:04:12.098 7fbc5885f700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RequestBackfillPrio: priority 100
2018-03-09 02:04:12.098 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepNotRecovering 0.000867 1 0.000032
2018-03-09 02:04:12.098 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Started/ReplicaActive/RepWaitBackfillReserved
2018-03-09 02:04:12.378 7fbc5885f700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RemoteBackfillReserved
2018-03-09 02:04:12.378 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepWaitBackfillReserved 0.281558 1 0.000049
2018-03-09 02:04:12.378 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Started/ReplicaActive/RepRecovering
2018-03-09 02:04:12.378 7fbc54857700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RemoteBackfillPreempted
2018-03-09 02:04:12.382 7fbc5885f700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RemoteReservationCanceled
2018-03-09 02:04:12.382 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepRecovering 0.001055 2 0.000046
2018-03-09 02:04:12.382 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Started/ReplicaActive/RepNotRecovering
2018-03-09 02:04:12.382 7fbc54857700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RequestBackfillPrio: priority 100
2018-03-09 02:04:12.382 7fbc54857700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepNotRecovering 0.000264 1 0.000027
2018-03-09 02:04:12.382 7fbc54857700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Started/ReplicaActive/RepWaitBackfillReserved
2018-03-09 02:04:12.714 7fbc5885f700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RemoteBackfillReserved
2018-03-09 02:04:12.714 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepWaitBackfillReserved 0.334811 1 0.000021
2018-03-09 02:04:12.714 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Started/ReplicaActive/RepRecovering
2018-03-09 02:04:12.714 7fbc54857700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RemoteBackfillPreempted
2018-03-09 02:04:12.718 7fbc5885f700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RemoteReservationCanceled
2018-03-09 02:04:12.718 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepRecovering 0.003034 2 0.000138
2018-03-09 02:04:12.718 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Started/ReplicaActive/RepNotRecovering
2018-03-09 02:04:12.718 7fbc5885f700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RequestBackfillPrio: priority 100
2018-03-09 02:04:12.718 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepNotRecovering 0.000043 1 0.000029
2018-03-09 02:04:12.718 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] lb 2:3f7f0524:::smithi14815728-670:head (bitwise) local-lis/les=125/126 n=39 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Started/ReplicaActive/RepWaitBackfillReserved
2018-03-09 02:04:12.718 7fbc5885f700 10 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=125/126 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] do_peering_event: epoch_sent: 131 epoch_requested: 131 RecoveryDone
2018-03-09 02:04:12.718 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=125/126 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive/RepWaitBackfillReserved 0.000179 1 0.000030
2018-03-09 02:04:12.718 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=125/126 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started/ReplicaActive 0.668887 0 0.000000
2018-03-09 02:04:12.718 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=125/126 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] exit Started 1.679752 0 0.000000
2018-03-09 02:04:12.718 7fbc5885f700  5 osd.6 pg_epoch: 131 pg[2.1c( v 126'439 (26'100,126'439] local-lis/les=125/126 n=37 ec=118/18 lis/c 130/118 les/c/f 131/119/0 130/130/125) [6,5,4]/[4,5,7] r=-1 lpr=130 pi=[118,130)/1 luod=0'0 crt=126'439 active+remapped] enter Crashed
Actions #5

Updated by David Zafman about 6 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Josh Durgin about 6 years ago

  • Status changed from In Progress to Resolved
Actions #7

Updated by Neha Ojha over 5 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to luminous
Actions #8

Updated by David Zafman over 5 years ago

  • Copied to Backport #36678: luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event") added
Actions #9

Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF