Actions
Bug #4042
closedosd crash in recovery state: FAILED assert(0 == "we got a bad state machine event")
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I just rebooted a couple of my 0.56.2 nodes and out of 12 OSDs one went down with:
2013-02-07 16:56:43.187856 7fb172131700 -1 osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7fb172131700 time 2013-02-07 16:56:43.164424 osd/PG.cc: 5198: FAILED assert(0 == "we got a bad state machine event")
I've attached the logs of the OSD, but the debug is on default levels.
I tried to restart the OSD with higher debugging, but it then recovered just fine.
What I noticed is this:
filestore(/var/lib/ceph/osd/ceph-11) waiting 51 > 50 ops || 67697 > 104857600
The only occurrence of that line is on osd.11's log, it doesn't show up in any other log.
Files
Actions