Actions
Bug #52464
openFAILED ceph_assert(current_shard->second->valid())
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I've got a cephadm cluster I use for testing, and this morning one of the OSDs crashed down in bluestore code:
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: get compressor snappy = 0x55b3c18b1b90 Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore(/var/lib/ceph/osd/ceph-0) _open_fm::NCB::freelist_type=null Aug 31 09:51:40 cephadm2 ceph-osd[20497]: freelist init Aug 31 09:51:40 cephadm2 ceph-osd[20497]: freelist _read_cfg Aug 31 09:51:40 cephadm2 ceph-osd[20497]: asok(0x55b3c09f0000) register_command bluestore allocator dump block hook 0x55b3c18b1ef0 Aug 31 09:51:40 cephadm2 ceph-osd[20497]: asok(0x55b3c09f0000) register_command bluestore allocator score block hook 0x55b3c18b1ef0 Aug 31 09:51:40 cephadm2 ceph-osd[20497]: asok(0x55b3c09f0000) register_command bluestore allocator fragmentation block hook 0x55b3c18b1ef0 Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore::NCB::restore_allocator::file_size=0,sizeof(extent_t)=16 Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore::NCB::restore_allocator::No Valid allocation info on disk (empty file) Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore(/var/lib/ceph/osd/ceph-0) _init_alloc::NCB::restore_allocator() failed! Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore(/var/lib/ceph/osd/ceph-0) _init_alloc::NCB::Run Full Recovery from ONodes (might take a while) ... Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore::NCB::read_allocation_from_drive_on_startup::Start Allocation Recovery from ONodes ... Aug 31 09:51:40 cephadm2 ceph-osd[20497]: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-7195-g7e7326c4/rpm/el8/BUILD/ceph-17.0.0-7195-g7e7326c4/src/kv/RocksDBStore.cc: In function 'bool WholeMergeIteratorImpl::is_main_smaller()' thread 7f2d60f480c0 time 2021-08-31T13:51:40.899594+0000 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-7195-g7e7326c4/rpm/el8/BUILD/ceph-17.0.0-7195-g7e7326c4/src/kv/RocksDBStore.cc: 2288: FAILED ceph_assert(current_shard->second->valid()) ceph version 17.0.0-7195-g7e7326c4 (7e7326c4231f614aff0f7bef4d72beadce6a9c75) quincy (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55b3bdcb0b50] 2: /usr/bin/ceph-osd(+0x5ced71) [0x55b3bdcb0d71] 3: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x55b3be8f93db] 4: (WholeMergeIteratorImpl::next()+0x2c) [0x55b3be8f942c] 5: (BlueStore::_open_collections()+0x660) [0x55b3be2e67f0] 6: (BlueStore::read_allocation_from_drive_on_startup()+0x127) [0x55b3be2ffa97] 7: (BlueStore::_init_alloc()+0xa01) [0x55b3be300bd1] 8: (BlueStore::_open_db_and_around(bool, bool)+0x2f4) [0x55b3be3487e4] 9: (BlueStore::_mount()+0x1ae) [0x55b3be34b55e] 10: (OSD::init()+0x3ba) [0x55b3bddec0ba] 11: main() 12: __libc_start_main() 13: _start() Aug 31 09:51:40 cephadm2 ceph-osd[20497]: *** Caught signal (Aborted) ** in thread 7f2d60f480c0 thread_name:ceph-osd ceph version 17.0.0-7195-g7e7326c4 (7e7326c4231f614aff0f7bef4d72beadce6a9c75) quincy (dev) 1: /lib64/libpthread.so.0(+0x12b20) [0x7f2d5eeeeb20] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x55b3bdcb0bae] 5: /usr/bin/ceph-osd(+0x5ced71) [0x55b3bdcb0d71] 6: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x55b3be8f93db] 7: (WholeMergeIteratorImpl::next()+0x2c) [0x55b3be8f942c] 8: (BlueStore::_open_collections()+0x660) [0x55b3be2e67f0] 9: (BlueStore::read_allocation_from_drive_on_startup()+0x127) [0x55b3be2ffa97] 10: (BlueStore::_init_alloc()+0xa01) [0x55b3be300bd1] 11: (BlueStore::_open_db_and_around(bool, bool)+0x2f4) [0x55b3be3487e4] 12: (BlueStore::_mount()+0x1ae) [0x55b3be34b55e] 13: (OSD::init()+0x3ba) [0x55b3bddec0ba] 14: main() 15: __libc_start_main() 16: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Aug 31 09:51:40 cephadm2 conmon[20474]: -5> 2021-08-31T13:51:40.897+0000 7f2d60f480c0 -1 bluestore::NCB::restore_allocator::No Valid allocation info on disk (empty file) Aug 31 09:51:40 cephadm2 conmon[20474]: -1> 2021-08-31T13:51:40.903+0000 7f2d60f480c0 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-7195-g7e7326c4/rpm/el8/BUILD/ceph-17.0.0-7195-g7e7326c4/src/kv/RocksDBStore.cc: In function 'bool WholeMergeIteratorImpl::is_main_smaller()' thread 7f2d60f480c0 time 2021-08-31T13:51:40.899594+0000 Aug 31 09:51:40 cephadm2 conmon[20474]: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-7195-g7e7326c4/rpm/el8/BUILD/ceph-17.0.0-7195-g7e7326c4/src/kv/RocksDBStore.cc: 2288: FAILED ceph_assert(current_shard->second->valid()) Aug 31 09:51:40 cephadm2 conmon[20474]: Aug 31 09:51:40 cephadm2 conmon[20474]: ceph version 17.0.0-7195-g7e7326c4 (7e7326c4231f614aff0f7bef4d72beadce6a9c75) quincy (dev) Aug 31 09:51:40 cephadm2 conmon[20474]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55b3bdcb0b50] Aug 31 09:51:40 cephadm2 conmon[20474]: 2: /usr/bin/ceph-osd(+0x5ced71) [0x55b3bdcb0d71] Aug 31 09:51:40 cephadm2 conmon[20474]: 3: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x55b3be8f93db] Aug 31 09:51:40 cephadm2 conmon[20474]: 4: (WholeMergeIteratorImpl::next()+0x2c) [0x55b3be8f942c] Aug 31 09:51:40 cephadm2 conmon[20474]: 5: (BlueStore::_open_collections()+0x660) [0x55b3be2e67f0] Aug 31 09:51:40 cephadm2 conmon[20474]: 6: (BlueStore::read_allocation_from_drive_on_startup()+0x127) [0x55b3be2ffa97] Aug 31 09:51:40 cephadm2 conmon[20474]: 7: (BlueStore::_init_alloc()+0xa01) [0x55b3be300bd1] Aug 31 09:51:40 cephadm2 conmon[20474]: 8: (BlueStore::_open_db_and_around(bool, bool)+0x2f4) [0x55b3be3487e4] Aug 31 09:51:40 cephadm2 conmon[20474]: 9: (BlueStore::_mount()+0x1ae) [0x55b3be34b55e] Aug 31 09:51:40 cephadm2 conmon[20474]: 10: (OSD::init()+0x3ba) [0x55b3bddec0ba] Aug 31 09:51:40 cephadm2 conmon[20474]: 11: main() Aug 31 09:51:40 cephadm2 conmon[20474]: 12: __libc_start_main() Aug 31 09:51:40 cephadm2 conmon[20474]: 13: _start() Aug 31 09:51:40 cephadm2 conmon[20474]: Aug 31 09:51:40 cephadm2 conmon[20474]: 0> 2021-08-31T13:51:40.907+0000 7f2d60f480c0 -1 *** Caught signal (Aborted) ** Aug 31 09:51:40 cephadm2 conmon[20474]: in thread 7f2d60f480c0 thread_name:ceph-osd Aug 31 09:51:40 cephadm2 conmon[20474]: Aug 31 09:51:40 cephadm2 conmon[20474]: ceph version 17.0.0-7195-g7e7326c4 (7e7326c4231f614aff0f7bef4d72beadce6a9c75) quincy (dev) Aug 31 09:51:40 cephadm2 conmon[20474]: 1: /lib64/libpthread.so.0(+0x12b20) [0x7f2d5eeeeb20] Aug 31 09:51:40 cephadm2 conmon[20474]: 2: gsignal() Aug 31 09:51:40 cephadm2 conmon[20474]: 3: abort() Aug 31 09:51:40 cephadm2 conmon[20474]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x55b3bdcb0bae] Aug 31 09:51:40 cephadm2 conmon[20474]: 5: /usr/bin/ceph-osd(+0x5ced71) [0x55b3bdcb0d71] Aug 31 09:51:40 cephadm2 conmon[20474]: 6: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x55b3be8f93db] Aug 31 09:51:40 cephadm2 conmon[20474]: 7: (WholeMergeIteratorImpl::next()+0x2c) [0x55b3be8f942c] Aug 31 09:51:40 cephadm2 conmon[20474]: 8: (BlueStore::_open_collections()+0x660) [0x55b3be2e67f0] Aug 31 09:51:40 cephadm2 conmon[20474]: 9: (BlueStore::read_allocation_from_drive_on_startup()+0x127) [0x55b3be2ffa97] Aug 31 09:51:40 cephadm2 conmon[20474]: 10: (BlueStore::_init_alloc()+0xa01) [0x55b3be300bd1] Aug 31 09:51:40 cephadm2 conmon[20474]: 11: (BlueStore::_open_db_and_around(bool, bool)+0x2f4) [0x55b3be3487e4] Aug 31 09:51:40 cephadm2 conmon[20474]: 12: (BlueStore::_mount()+0x1ae) [0x55b3be34b55e] Aug 31 09:51:40 cephadm2 conmon[20474]: 13: (OSD::init()+0x3ba) [0x55b3bddec0ba] Aug 31 09:51:40 cephadm2 conmon[20474]: 14: main() Aug 31 09:51:40 cephadm2 conmon[20474]: 15: __libc_start_main() Aug 31 09:51:40 cephadm2 conmon[20474]: 16: _start() Aug 31 09:51:40 cephadm2 conmon[20474]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Aug 31 09:51:40 cephadm2 conmon[20474]: Aug 31 09:51:41 cephadm2 systemd-coredump[20743]: Process 20497 (ceph-osd) of user 167 dumped core. Aug 31 09:51:41 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Main process exited, code=exited, status=134/n/a Aug 31 09:51:42 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Failed with result 'exit-code'. Aug 31 09:51:52 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Service RestartSec=10s expired, scheduling restart. Aug 31 09:51:52 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Scheduled restart job, restart counter is at 6. Aug 31 09:51:52 cephadm2 systemd[1]: Stopped Ceph osd.0 for 1d11c63a-09ac-11ec-83e1-52540031ba78. Aug 31 09:51:52 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Start request repeated too quickly. Aug 31 09:51:52 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Failed with result 'exit-code'. Aug 31 09:51:52 cephadm2 systemd[1]: Failed to start Ceph osd.0 for 1d11c63a-09ac-11ec-83e1-52540031ba78.
The build I'm using is based on commit a49f10e760b4, with some MDS patches on top (nothing that should affect OSD).
Actions