Bug #15520
closedOSDs refuse to start, latest osdmap missing
0%
Description
We had a problem on our production cluster (running 9.2.1) which caused /proc, /dev and /sys to be unmounted. During this time, we received the following error on a large number of OSDs (for various osdmap epochs):
Apr 15 15:25:19 kaa-99 ceph-osd4167: 2016-04-15 15:25:19.457774 7f1c817fd700 0 filestore(/local/ceph/osd.43) write couldn't open meta/-1/c188e154/osdmap.276293/0: (2) No such file or directory
After restarting the hosts, the OSDs now refuse to start with:
Apr 15 16:03:53 kaa-99 ceph-osd4211: -2> 2016-04-15 16:03:53.089842 7f8e9f840840 10 _load_class version success
Apr 15 16:03:53 kaa-99 ceph-osd4211: -1> 2016-04-15 16:03:53.089863 7f8e9f840840 20 osd.43 0 get_map 276424 - loading and decoding 0x7f8e9b841780
Apr 15 16:03:53 kaa-99 ceph-osd4211: 0> 2016-04-15 16:03:53.140754 7f8e9f840840 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e9f840840 time 2016-04-15 16:03:53.139563
osd/OSD.h: 847: FAILED assert(ret)
Inserting the map with ceph-objectstore-tool –op set-osdmap does not work and gives the following error:
osdmap (-1/c1882e94/osdmap.276507/0) does not exist.
2016-04-15 17:14:00.335751 7f4b4d75b840 1 journal close /dev/ssd/journal.43
How can I get the OSDs running again?