Actions
Bug #14546
closedmira033 kernel panic from MCE
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
<0>[23897.753849] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 8: fe000ec00001009f <0>[23897.762390] mce: [Hardware Error]: TSC 34504cd160cc ADDR 7cf25380 MISC e6a9590a00041283 <0>[23897.770554] mce: [Hardware Error]: PROCESSOR 0:106e5 TIME 1453862482 SOCKET 0 APIC 0 microcode 3 <0>[23897.779355] mce: [Hardware Error]: Machine check: Processor context corrupt <0>[23897.786325] Kernel panic - not syncing: Fatal Machine check [dumpcommon]kdb> -bt Stack traceback for pid 27145 0xffff880165d20000 27145 25197 1 2 R 0xffff880165d204e8 *fn_anonymous ffff88043fc88d50 0000000000000000 Call Trace: <#DB> <<EOE>> <#MC> [<ffffffff81102c79>] ? kgdb_panic_event+0x29/0x30 [<ffffffff8173125c>] ? notifier_call_chain+0x4c/0x70 [<ffffffff817312ba>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8171da17>] ? panic+0xec/0x1d7 [<ffffffff8171e51e>] ? printk+0x67/0x69 [<ffffffff81036e5a>] ? mce_panic+0x1fa/0x210 [<ffffffff81038ca4>] ? do_machine_check+0xaa4/0xab0 [<ffffffff8172d43f>] ? machine_check+0x1f/0x30 <<EOE>>
Files
Updated by Dan Mick about 8 years ago
- Subject changed from mira033 kernel panic to mira033 kernel panic from MCE
Updated by David Galloway almost 8 years ago
- Status changed from New to In Progress
Looked into this today. The system hung for quite some time waiting for RAID FW to load but eventually got past it.
I flashed the latest BIOS and RAID controller firmware (V1.49 to V1.52) available for the box.
After rebooting, RAID firmware loaded after a reasonable amount of time and system booted normally.
I've nuked and released the machine but will keep the ticket open for a bit in case more MCEs occur.
Updated by David Galloway almost 8 years ago
Tested DIMMs and didn't find a bad one. If MCEs persist, will retire machine.
Updated by David Galloway over 7 years ago
- Status changed from In Progress to Resolved
Machine seems to be passing jobs w/o issue.
Actions