Project

General

Profile

Actions

Bug #23860

closed

luminous->master: luminous crashes with AllReplicasRecovered in Started/Primary/Active/NotRecovering state

Added by Sage Weil almost 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

  -370> 2018-04-25 09:03:11.024956 7fe21d40d700 10 osd.3 pg_epoch: 1358 pg[2.e( v 18'4 (0'0,18'4] local-lis/les=1357/1358 n=1 ec=45/15 lis/c 1357/1350 les/c/f 1358/1353/0 1357/1357/1357) [3,4] r=0 lpr=1357 pi=[1350,1357)/1 crt=18'4 mlcod 18'4 active+recovery_wait] handle_peering_event: epoch_sent: 1358 epoch_reques
ted: 1358 AllReplicasRecovered
  -369> 2018-04-25 09:03:11.024969 7fe21d40d700  5 osd.3 pg_epoch: 1358 pg[2.e( v 18'4 (0'0,18'4] local-lis/les=1357/1358 n=1 ec=45/15 lis/c 1357/1350 les/c/f 1358/1353/0 1357/1357/1357) [3,4] r=0 lpr=1357 pi=[1350,1357)/1 crt=18'4 mlcod 18'4 active+recovery_wait] exit Started/Primary/Active/NotRecovering 0.000193 
1 0.000120
  -368> 2018-04-25 09:03:11.024979 7fe21d40d700  5 osd.3 pg_epoch: 1358 pg[2.e( v 18'4 (0'0,18'4] local-lis/les=1357/1358 n=1 ec=45/15 lis/c 1357/1350 les/c/f 1358/1353/0 1357/1357/1357) [3,4] r=0 lpr=1357 pi=[1350,1357)/1 crt=18'4 mlcod 18'4 active+recovery_wait] exit Started/Primary/Active 0.311142 0 0.000000
  -367> 2018-04-25 09:03:11.024989 7fe21d40d700 20 osd.3 pg_epoch: 1358 pg[2.e( v 18'4 (0'0,18'4] local-lis/les=1357/1358 n=1 ec=45/15 lis/c 1357/1350 les/c/f 1358/1353/0 1357/1357/1357) [3,4] r=0 lpr=1357 pi=[1350,1357)/1 crt=18'4 mlcod 18'4 active] agent_stop
  -366> 2018-04-25 09:03:11.024997 7fe21d40d700  5 osd.3 pg_epoch: 1358 pg[2.e( v 18'4 (0'0,18'4] local-lis/les=1357/1358 n=1 ec=45/15 lis/c 1357/1350 les/c/f 1358/1353/0 1357/1357/1357) [3,4] r=0 lpr=1357 pi=[1350,1357)/1 crt=18'4 mlcod 18'4 active] exit Started/Primary 1.320618 0 0.000000
  -365> 2018-04-25 09:03:11.025004 7fe21d40d700 10 osd.3 pg_epoch: 1358 pg[2.e( v 18'4 (0'0,18'4] local-lis/les=1357/1358 n=1 ec=45/15 lis/c 1357/1350 les/c/f 1358/1353/0 1357/1357/1357) [3,4] r=0 lpr=1357 pi=[1350,1357)/1 crt=18'4 mlcod 18'4 active] clear_primary_state
  -364> 2018-04-25 09:03:11.025019 7fe21d40d700 10 osd.3 pg_epoch: 1358 pg[2.e( v 18'4 (0'0,18'4] local-lis/les=1357/1358 n=1 ec=45/15 lis/c 1357/1350 les/c/f 1358/1353/0 1357/1357/1357) [3,4] r=0 lpr=1357 pi=[1350,1357)/1 luod=0'0 crt=18'4 mlcod 0'0 active] release_backoffs [2:70000000::::head,2:74000000::::head)
  -363> 2018-04-25 09:03:11.025029 7fe21d40d700 20 osd.3 pg_epoch: 1358 pg[2.e( v 18'4 (0'0,18'4] local-lis/les=1357/1358 n=1 ec=45/15 lis/c 1357/1350 les/c/f 1358/1353/0 1357/1357/1357) [3,4] r=0 lpr=1357 pi=[1350,1357)/1 luod=0'0 crt=18'4 mlcod 0'0 active] agent_stop
  -362> 2018-04-25 09:03:11.025036 7fe21d40d700  5 osd.3 pg_epoch: 1358 pg[2.e( v 18'4 (0'0,18'4] local-lis/les=1357/1358 n=1 ec=45/15 lis/c 1357/1350 les/c/f 1358/1353/0 1357/1357/1357) [3,4] r=0 lpr=1357 pi=[1350,1357)/1 luod=0'0 crt=18'4 mlcod 0'0 active] exit Started 1.320717 0 0.000000
  -361> 2018-04-25 09:03:11.025045 7fe21d40d700  5 osd.3 pg_epoch: 1358 pg[2.e( v 18'4 (0'0,18'4] local-lis/les=1357/1358 n=1 ec=45/15 lis/c 1357/1350 les/c/f 1358/1353/0 1357/1357/1357) [3,4] r=0 lpr=1357 pi=[1350,1357)/1 luod=0'0 crt=18'4 mlcod 0'0 active] enter Crashed
     0> 2018-04-25 09:03:11.028741 7fe21d40d700 -1 /build/ceph-12.2.5/src/osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7fe21d40d700 time 2018-04-25 09:03:11.025053
/build/ceph-12.2.5/src/osd/PG.cc: 6080: FAILED assert(0 == "we got a bad state machine event")

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55e335027a02]
 2: (PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0x135) [0x55e334ac9d65]
 3: (()+0x5a6fd6) [0x55e334b0efd6]
 4: (boost::statechart::simple_state<PG::RecoveryState::Primary, PG::RecoveryState::Started, PG::RecoveryState::Peering, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x16e) [0x55e334b5077e]
 5: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, PG::RecoveryState::Activating, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x1c9) [0x55e334b49d79]
 6: (boost::statechart::simple_state<PG::RecoveryState::NotRecovering, PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xa2) [0x55e334b4b292]

/a/sage-2018-04-25_02:28:01-rados-wip-sage3-testing-2018-04-24-1729-distro-basic-smithi/2436768


Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #23988: luminous: luminous->master: luminous crashes with AllReplicasRecovered in Started/Primary/Active/NotRecovering stateResolvedPrashant DActions
Actions #1

Updated by Sage Weil almost 6 years ago

  • Status changed from 12 to Fix Under Review
  • Backport set to luminous
Actions #2

Updated by Sage Weil almost 6 years ago

  • Status changed from Fix Under Review to 7
Actions #3

Updated by Sage Weil almost 6 years ago

  • Assignee set to Sage Weil
Actions #4

Updated by Josh Durgin almost 6 years ago

  • Status changed from 7 to Pending Backport
Actions #5

Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #23988: luminous: luminous->master: luminous crashes with AllReplicasRecovered in Started/Primary/Active/NotRecovering state added
Actions #6

Updated by Nathan Cutler almost 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF