Project

General

Profile

Actions

Bug #15520

closed

OSDs refuse to start, latest osdmap missing

Added by Markus Blank-Burian about 8 years ago. Updated almost 8 years ago.

Status:
Rejected
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We had a problem on our production cluster (running 9.2.1) which caused /proc, /dev and /sys to be unmounted. During this time, we received the following error on a large number of OSDs (for various osdmap epochs):

Apr 15 15:25:19 kaa-99 ceph-osd4167: 2016-04-15 15:25:19.457774 7f1c817fd700 0 filestore(/local/ceph/osd.43) write couldn't open meta/-1/c188e154/osdmap.276293/0: (2) No such file or directory

After restarting the hosts, the OSDs now refuse to start with:

Apr 15 16:03:53 kaa-99 ceph-osd4211: -2> 2016-04-15 16:03:53.089842 7f8e9f840840 10 _load_class version success
Apr 15 16:03:53 kaa-99 ceph-osd4211: -1> 2016-04-15 16:03:53.089863 7f8e9f840840 20 osd.43 0 get_map 276424 - loading and decoding 0x7f8e9b841780
Apr 15 16:03:53 kaa-99 ceph-osd4211: 0> 2016-04-15 16:03:53.140754 7f8e9f840840 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e9f840840 time 2016-04-15 16:03:53.139563
osd/OSD.h: 847: FAILED assert(ret)

Inserting the map with ceph-objectstore-tool –op set-osdmap does not work and gives the following error:

osdmap (-1/c1882e94/osdmap.276507/0) does not exist.
2016-04-15 17:14:00.335751 7f4b4d75b840 1 journal close /dev/ssd/journal.43

How can I get the OSDs running again?

Actions

Also available in: Atom PDF