Bug #19925
closedBluestore osd crashes on start
0%
Description
I'm seeing an issue very similar to that seen in #16278 - whereby a Bluestore OSD crashes upon start with the following stack trace;
2017-05-14 01:41:22.076972 7fb7d500d8c0 -1 *** Caught signal (Aborted) ** in thread 7fb7d500d8c0 thread_name:ceph-osd ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) 1: (()+0x9770ae) [0x5584161fb0ae] 2: (()+0x11390) [0x7fb7d3eca390] 3: (gsignal()+0x38) [0x7fb7d1e67428] 4: (abort()+0x16a) [0x7fb7d1e6902a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x26b) [0x5584162fb54b] 6: (OSDService::get_map(unsigned int)+0x5d) [0x558415c7252d] 7: (OSD::init()+0x1f91) [0x558415c21161] 8: (main()+0x2ea5) [0x558415b92dc5] 9: (__libc_start_main()+0xf0) [0x7fb7d1e52830] 10: (_start()+0x29) [0x558415bd4459] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
I suspect that this could be related to data corruption, but don't know for sure. Shortly before this, I was observing pg degradation on the pool. Currently, 2 out of my 3 OSD nodes are failing to start with this stack trace, so only a single OSD is live;
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.00027 root default -2 0.00009 host ceph-01 0 0.00009 osd.0 down 0 1.00000 -3 0.00009 host ceph-02 1 0.00009 osd.1 up 1.00000 1.00000 -4 0.00009 host ceph-03 2 0.00009 osd.2 down 0 1.00000
As a result,
ceph health detailis reporting every PG as undersized+degraded+peered, with only a single node acting.
How can I best troubleshoot the stack trace that the OSD is outputting? I've attached the relevant log output, but I can't seem to spot anything relevant.
Files
Updated by K Jarrett almost 7 years ago
I've just noticed that i'm running 10.2.7, and there have been a lot of enhancements to Bluestore since then. I thought I had installed 11.x.x, but it would appear not!
I understand that Bluestore can't be upgraded between 10.x and 11.x, so I may need to destroy my current setup.