Project

General

Profile

Actions

Bug #4042

closed

osd crash in recovery state: FAILED assert(0 == "we got a bad state machine event")

Added by Wido den Hollander about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I just rebooted a couple of my 0.56.2 nodes and out of 12 OSDs one went down with:

2013-02-07 16:56:43.187856 7fb172131700 -1 osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread
 7fb172131700 time 2013-02-07 16:56:43.164424
osd/PG.cc: 5198: FAILED assert(0 == "we got a bad state machine event")

I've attached the logs of the OSD, but the debug is on default levels.

I tried to restart the OSD with higher debugging, but it then recovered just fine.

What I noticed is this:

filestore(/var/lib/ceph/osd/ceph-11) waiting 51 > 50 ops || 67697 > 104857600

The only occurrence of that line is on osd.11's log, it doesn't show up in any other log.


Files

ceph-osd.11.log (501 KB) ceph-osd.11.log Wido den Hollander, 02/07/2013 08:40 AM
Actions #1

Updated by Ian Colle about 11 years ago

  • Assignee set to Sage Weil
  • Priority changed from Normal to Urgent
Actions #2

Updated by Sage Weil about 11 years ago

  • Status changed from New to Need More Info

Hey Wido- Do you have have the core by chance?

Actions #3

Updated by Wido den Hollander about 11 years ago

Nope. I've looked at it when reporting this issue, but I couldn't find a core file. I'd expected one to be in /, but none.

Actions #4

Updated by Sage Weil about 11 years ago

  • Assignee deleted (Sage Weil)
Actions #5

Updated by Ian Colle about 11 years ago

  • Status changed from Need More Info to New
  • Assignee set to Samuel Just
Actions #6

Updated by Samuel Just about 11 years ago

  • Status changed from New to Fix Under Review

wip_4042 I think should take care of it.

Actions #7

Updated by Sage Weil about 11 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF