Project

General

Profile

Actions

Bug #4042

closed

osd crash in recovery state: FAILED assert(0 == "we got a bad state machine event")

Added by Wido den Hollander about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I just rebooted a couple of my 0.56.2 nodes and out of 12 OSDs one went down with:

2013-02-07 16:56:43.187856 7fb172131700 -1 osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread
 7fb172131700 time 2013-02-07 16:56:43.164424
osd/PG.cc: 5198: FAILED assert(0 == "we got a bad state machine event")

I've attached the logs of the OSD, but the debug is on default levels.

I tried to restart the OSD with higher debugging, but it then recovered just fine.

What I noticed is this:

filestore(/var/lib/ceph/osd/ceph-11) waiting 51 > 50 ops || 67697 > 104857600

The only occurrence of that line is on osd.11's log, it doesn't show up in any other log.


Files

ceph-osd.11.log (501 KB) ceph-osd.11.log Wido den Hollander, 02/07/2013 08:40 AM
Actions

Also available in: Atom PDF