Actions
Bug #4042
closedosd crash in recovery state: FAILED assert(0 == "we got a bad state machine event")
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I just rebooted a couple of my 0.56.2 nodes and out of 12 OSDs one went down with:
2013-02-07 16:56:43.187856 7fb172131700 -1 osd/PG.cc: In function 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread 7fb172131700 time 2013-02-07 16:56:43.164424 osd/PG.cc: 5198: FAILED assert(0 == "we got a bad state machine event")
I've attached the logs of the OSD, but the debug is on default levels.
I tried to restart the OSD with higher debugging, but it then recovered just fine.
What I noticed is this:
filestore(/var/lib/ceph/osd/ceph-11) waiting 51 > 50 ops || 67697 > 104857600
The only occurrence of that line is on osd.11's log, it doesn't show up in any other log.
Files
Updated by Ian Colle over 11 years ago
- Assignee set to Sage Weil
- Priority changed from Normal to Urgent
Updated by Sage Weil over 11 years ago
- Status changed from New to Need More Info
Hey Wido- Do you have have the core by chance?
Updated by Wido den Hollander over 11 years ago
Nope. I've looked at it when reporting this issue, but I couldn't find a core file. I'd expected one to be in /, but none.
Updated by Ian Colle about 11 years ago
- Status changed from Need More Info to New
- Assignee set to Samuel Just
Updated by Samuel Just about 11 years ago
- Status changed from New to Fix Under Review
wip_4042 I think should take care of it.
Updated by Sage Weil about 11 years ago
- Status changed from Fix Under Review to Resolved
Actions