Project

General

Profile

Actions

Bug #5060

closed

osd: decode failure in load_pgs on 0.56.4

Added by Ivan Kudryavtsev almost 11 years ago. Updated almost 11 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On of my osd hosts crashed on high load and after rebooted it is unable to start some osds.
Error log for osd.3 is in attachment.
Before crash LA started to grow to 30 from 1.0 and IOWAIT also, may be RAID controller bug.


Files

ceph-osd.3.log (45.9 KB) ceph-osd.3.log Ivan Kudryavtsev, 05/14/2013 05:20 AM
Actions #1

Updated by Sage Weil almost 11 years ago

  • Assignee set to Samuel Just
  • Priority changed from Normal to Urgent
Actions #2

Updated by Sage Weil almost 11 years ago

  • Subject changed from Ceph crashed under high load, osd start failed after reboot to osd: decode failure in load_pgs on 0.56.4
  • Status changed from New to Need More Info

Was the ceph-osd process that originally crashed under load also 0.56.4? Or an earlier version? (Do you have the log/crash dump for that)

And, can you reproduce the start-up crash with 'debug osd = 20' and 'debug filestore = 20' and 'debug ms = 1'?

Actions #3

Updated by Ivan Kudryavtsev almost 11 years ago

Yes, it was in 0.56.4 and before.
I can not reproduce because already formatted.

Actions #4

Updated by Sage Weil almost 11 years ago

  • Priority changed from Urgent to High
Actions #5

Updated by Sage Weil almost 11 years ago

  • Status changed from Need More Info to Can't reproduce

If you see this again, please capture the stacktrace of the original before recovering, and if you can, generate a complete log of the failed restart that follows. Thanks!

Actions

Also available in: Atom PDF