Project

General

Profile

Actions

Bug #12427

closed

OSD dies after a couple of seconds

Added by William Kennington almost 9 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I upgraded my cluster from 0.94.2 to 3f04a6126fdbfb93304f798da3775c0eec9b7d44 (2015-07-20), first by upgrading all of the mons which seemed to go smoothly. Then I upgraded one of the machines hosting my ods. This resulted in aborts when starting any of the osds.


Files

bGCC.txt (22.4 KB) bGCC.txt William Kennington, 07/21/2015 11:03 PM
AIEF.txt (163 KB) AIEF.txt William Kennington, 07/21/2015 11:09 PM

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #12536: "FAILED assert(!log.null() || olog.tail == eversion_t())"Resolved07/30/2015

Actions
Actions #1

Updated by William Kennington almost 9 years ago

Also when creating a fresh OSD using the git code I get the following abort. http://sprunge.us/OHYN

Actions #2

Updated by William Kennington almost 9 years ago

It's also worth noting that the remaining 0.94.2 osds are reporting CRC errors in the osdmap.

Actions #3

Updated by Samuel Just over 8 years ago

The crc errors in the osdmap are actually correct, the new osds encode a version other than what the mons encode since they know about new fields so they request the actual fullmap from the mon.

The second link with the new code is an actual bug (confirmed in our testing) with current master, http://tracker.ceph.com/issues/12536. That leaves the first log which looks like a corrupted journal. That doesn't seem obviously upgrade related since I don't know of anything that changed in that area since hammer. If you are able to reproduce the journal corruption I'll look further.

Actions #4

Updated by Sage Weil about 7 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF