Bug #12427
closedOSD dies after a couple of seconds
0%
Description
I upgraded my cluster from 0.94.2 to 3f04a6126fdbfb93304f798da3775c0eec9b7d44 (2015-07-20), first by upgrading all of the mons which seemed to go smoothly. Then I upgraded one of the machines hosting my ods. This resulted in aborts when starting any of the osds.
Files
Updated by William Kennington almost 9 years ago
Also when creating a fresh OSD using the git code I get the following abort. http://sprunge.us/OHYN
Updated by William Kennington almost 9 years ago
It's also worth noting that the remaining 0.94.2 osds are reporting CRC errors in the osdmap.
Updated by Samuel Just almost 9 years ago
The crc errors in the osdmap are actually correct, the new osds encode a version other than what the mons encode since they know about new fields so they request the actual fullmap from the mon.
The second link with the new code is an actual bug (confirmed in our testing) with current master, http://tracker.ceph.com/issues/12536. That leaves the first log which looks like a corrupted journal. That doesn't seem obviously upgrade related since I don't know of anything that changed in that area since hammer. If you are able to reproduce the journal corruption I'll look further.
Updated by Sage Weil about 7 years ago
- Status changed from New to Can't reproduce