Bug #16400
closedCeph OSD crashes suddenly after restart when using bluestore
0%
Description
Hi there.
We've set up a small 4-node Bluestore cluster for testing and I noticed that sometimes the OSD would not start after a shutdown, throwing assertion immediately. Here's a snippet from the OSD stdout and I've attached the OSD log:
2016-06-21 18:14:22.956887 7f376e0bd800 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
2016-06-21 18:14:22.957140 7f376e0bd800 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
2016-06-21 18:14:22.957229 7f376e0bd800 -1 WARNING: experimental feature 'bluestore' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster. Do not use
feature with important data.
starting osd.33 at :/0 osd_data /var/lib/ceph/osd/cephsml-33 /var/lib/ceph/osd/cephsml-33/journal
2016-06-21 18:14:22.979703 7f376e0bd800 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
2016-06-21 18:14:23.033441 7f376e0bd800 -1 WARNING: experimental feature 'rocksdb' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster. Do not use
feature with important data.
2016-06-21 18:14:24.024998 7f376e0bd800 -1 WARNING: experimental feature 'rocksdb' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster. Do not use
feature with important data.
osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f376e0bd800 time 2016-06-21 18:14:24.088885
osd/OSD.h: 885: FAILED assert(ret)
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f376eaea5b5]
2: (OSDService::get_map(unsigned int)+0x3d) [0x7f376e4c893d]
3: (OSD::init()+0x1fe2) [0x7f376e47bdb2]
4: (main()+0x2c55) [0x7f376e3dfbe5]
5: (__libc_start_main()+0xf5) [0x7f376afceb15]
6: (()+0x353009) [0x7f376e42a009]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-06-21 18:14:24.090582 7f376e0bd800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f376e0bd800 time 2016-06-21 18:14:24.088885
osd/OSD.h: 885: FAILED assert(ret)
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f376eaea5b5]
2: (OSDService::get_map(unsigned int)+0x3d) [0x7f376e4c893d]
3: (OSD::init()+0x1fe2) [0x7f376e47bdb2]
4: (main()+0x2c55) [0x7f376e3dfbe5]
5: (__libc_start_main()+0xf5) [0x7f376afceb15]
6: (()+0x353009) [0x7f376e42a009]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-665> 2016-06-21 18:14:22.956887 7f376e0bd800 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
-664> 2016-06-21 18:14:22.957140 7f376e0bd800 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
-660> 2016-06-21 18:14:22.957229 7f376e0bd800 -1 WARNING: experimental feature 'bluestore' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster. Do not use
feature with important data.
-650> 2016-06-21 18:14:22.979703 7f376e0bd800 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
-620> 2016-06-21 18:14:23.033441 7f376e0bd800 -1 WARNING: experimental feature 'rocksdb' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster. Do not use
feature with important data.
-162> 2016-06-21 18:14:24.024998 7f376e0bd800 -1 WARNING: experimental feature 'rocksdb' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster. Do not use
feature with important data.
0> 2016-06-21 18:14:24.090582 7f376e0bd800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f376e0bd800 time 2016-06-21 18:14:24.088885
osd/OSD.h: 885: FAILED assert(ret)
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f376eaea5b5]
2: (OSDService::get_map(unsigned int)+0x3d) [0x7f376e4c893d]
3: (OSD::init()+0x1fe2) [0x7f376e47bdb2]
4: (main()+0x2c55) [0x7f376e3dfbe5]
5: (__libc_start_main()+0xf5) [0x7f376afceb15]
6: (()+0x353009) [0x7f376e42a009]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
- Caught signal (Aborted)
in thread 7f376e0bd800 thread_name:ceph-osd
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (()+0x91341a) [0x7f376e9ea41a]
2: (()+0xf100) [0x7f376ca20100]
3: (gsignal()+0x37) [0x7f376afe25f7]
4: (abort()+0x148) [0x7f376afe3ce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f376eaea797]
6: (OSDService::get_map(unsigned int)+0x3d) [0x7f376e4c893d]
7: (OSD::init()+0x1fe2) [0x7f376e47bdb2]
8: (main()+0x2c55) [0x7f376e3dfbe5]
9: (__libc_start_main()+0xf5) [0x7f376afceb15]
10: (()+0x353009) [0x7f376e42a009]
2016-06-21 18:14:24.095919 7f376e0bd800 -1 Caught signal (Aborted) *
in thread 7f376e0bd800 thread_name:ceph-osdceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (()+0x91341a) [0x7f376e9ea41a]
2: (()+0xf100) [0x7f376ca20100]
3: (gsignal()+0x37) [0x7f376afe25f7]
4: (abort()+0x148) [0x7f376afe3ce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f376eaea797]
6: (OSDService::get_map(unsigned int)+0x3d) [0x7f376e4c893d]
7: (OSD::init()+0x1fe2) [0x7f376e47bdb2]
8: (main()+0x2c55) [0x7f376e3dfbe5]
9: (__libc_start_main()+0xf5) [0x7f376afceb15]
10: (()+0x353009) [0x7f376e42a009]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.0> 2016-06-21 18:14:24.095919 7f376e0bd800 -1 ** Caught signal (Aborted) *
in thread 7f376e0bd800 thread_name:ceph-osdceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (()+0x91341a) [0x7f376e9ea41a]
2: (()+0xf100) [0x7f376ca20100]
3: (gsignal()+0x37) [0x7f376afe25f7]
4: (abort()+0x148) [0x7f376afe3ce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f376eaea797]
6: (OSDService::get_map(unsigned int)+0x3d) [0x7f376e4c893d]
7: (OSD::init()+0x1fe2) [0x7f376e47bdb2]
8: (main()+0x2c55) [0x7f376e3dfbe5]
9: (__libc_start_main()+0xf5) [0x7f376afceb15]
10: (()+0x353009) [0x7f376e42a009]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Aborted
Files