Project

General

Profile

Actions

Bug #52464

open

FAILED ceph_assert(current_shard->second->valid())

Added by Jeff Layton over 2 years ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've got a cephadm cluster I use for testing, and this morning one of the OSDs crashed down in bluestore code:

Aug 31 09:51:40 cephadm2 ceph-osd[20497]: get compressor snappy = 0x55b3c18b1b90
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore(/var/lib/ceph/osd/ceph-0) _open_fm::NCB::freelist_type=null
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: freelist init
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: freelist _read_cfg
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: asok(0x55b3c09f0000) register_command bluestore allocator dump block hook 0x55b3c18b1ef0
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: asok(0x55b3c09f0000) register_command bluestore allocator score block hook 0x55b3c18b1ef0
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: asok(0x55b3c09f0000) register_command bluestore allocator fragmentation block hook 0x55b3c18b1ef0
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore::NCB::restore_allocator::file_size=0,sizeof(extent_t)=16
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore::NCB::restore_allocator::No Valid allocation info on disk (empty file)
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore(/var/lib/ceph/osd/ceph-0) _init_alloc::NCB::restore_allocator() failed!
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore(/var/lib/ceph/osd/ceph-0) _init_alloc::NCB::Run Full Recovery from ONodes (might take a while) ...
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: bluestore::NCB::read_allocation_from_drive_on_startup::Start Allocation Recovery from ONodes ...
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-7195-g7e7326c4/rpm/el8/BUILD/ceph-17.0.0-7195-g7e7326c4/src/kv/RocksDBStore.cc: In function 'bool WholeMergeIteratorImpl::is_main_smaller()' thread 7f2d60f480c0 time 2021-08-31T13:51:40.899594+0000
                                          /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-7195-g7e7326c4/rpm/el8/BUILD/ceph-17.0.0-7195-g7e7326c4/src/kv/RocksDBStore.cc: 2288: FAILED ceph_assert(current_shard->second->valid())

                                           ceph version 17.0.0-7195-g7e7326c4 (7e7326c4231f614aff0f7bef4d72beadce6a9c75) quincy (dev)
                                           1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55b3bdcb0b50]
                                           2: /usr/bin/ceph-osd(+0x5ced71) [0x55b3bdcb0d71]
                                           3: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x55b3be8f93db]
                                           4: (WholeMergeIteratorImpl::next()+0x2c) [0x55b3be8f942c]
                                           5: (BlueStore::_open_collections()+0x660) [0x55b3be2e67f0]
                                           6: (BlueStore::read_allocation_from_drive_on_startup()+0x127) [0x55b3be2ffa97]
                                           7: (BlueStore::_init_alloc()+0xa01) [0x55b3be300bd1]
                                           8: (BlueStore::_open_db_and_around(bool, bool)+0x2f4) [0x55b3be3487e4]
                                           9: (BlueStore::_mount()+0x1ae) [0x55b3be34b55e]
                                           10: (OSD::init()+0x3ba) [0x55b3bddec0ba]
                                           11: main()
                                           12: __libc_start_main()
                                           13: _start()
Aug 31 09:51:40 cephadm2 ceph-osd[20497]: *** Caught signal (Aborted) **
                                           in thread 7f2d60f480c0 thread_name:ceph-osd

                                           ceph version 17.0.0-7195-g7e7326c4 (7e7326c4231f614aff0f7bef4d72beadce6a9c75) quincy (dev)
                                           1: /lib64/libpthread.so.0(+0x12b20) [0x7f2d5eeeeb20]
                                           2: gsignal()
                                           3: abort()
                                           4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x55b3bdcb0bae]
                                           5: /usr/bin/ceph-osd(+0x5ced71) [0x55b3bdcb0d71]
                                           6: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x55b3be8f93db]
                                           7: (WholeMergeIteratorImpl::next()+0x2c) [0x55b3be8f942c]
                                           8: (BlueStore::_open_collections()+0x660) [0x55b3be2e67f0]
                                           9: (BlueStore::read_allocation_from_drive_on_startup()+0x127) [0x55b3be2ffa97]
                                           10: (BlueStore::_init_alloc()+0xa01) [0x55b3be300bd1]
                                           11: (BlueStore::_open_db_and_around(bool, bool)+0x2f4) [0x55b3be3487e4]
                                           12: (BlueStore::_mount()+0x1ae) [0x55b3be34b55e]
                                           13: (OSD::init()+0x3ba) [0x55b3bddec0ba]
                                           14: main()
                                           15: __libc_start_main()
                                           16: _start()
                                           NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Aug 31 09:51:40 cephadm2 conmon[20474]:     -5> 2021-08-31T13:51:40.897+0000 7f2d60f480c0 -1 bluestore::NCB::restore_allocator::No Valid allocation info on disk (empty file)
Aug 31 09:51:40 cephadm2 conmon[20474]:     -1> 2021-08-31T13:51:40.903+0000 7f2d60f480c0 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-7195-g7e7326c4/rpm/el8/BUILD/ceph-17.0.0-7195-g7e7326c4/src/kv/RocksDBStore.cc: In function 'bool WholeMergeIteratorImpl::is_main_smaller()' thread 7f2d60f480c0 time 2021-08-31T13:51:40.899594+0000
Aug 31 09:51:40 cephadm2 conmon[20474]: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-7195-g7e7326c4/rpm/el8/BUILD/ceph-17.0.0-7195-g7e7326c4/src/kv/RocksDBStore.cc: 2288: FAILED ceph_assert(current_shard->second->valid())
Aug 31 09:51:40 cephadm2 conmon[20474]: 
Aug 31 09:51:40 cephadm2 conmon[20474]:  ceph version 17.0.0-7195-g7e7326c4 (7e7326c4231f614aff0f7bef4d72beadce6a9c75) quincy (dev)
Aug 31 09:51:40 cephadm2 conmon[20474]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55b3bdcb0b50]
Aug 31 09:51:40 cephadm2 conmon[20474]:  2: /usr/bin/ceph-osd(+0x5ced71) [0x55b3bdcb0d71]
Aug 31 09:51:40 cephadm2 conmon[20474]:  3: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x55b3be8f93db]
Aug 31 09:51:40 cephadm2 conmon[20474]:  4: (WholeMergeIteratorImpl::next()+0x2c) [0x55b3be8f942c]
Aug 31 09:51:40 cephadm2 conmon[20474]:  5: (BlueStore::_open_collections()+0x660) [0x55b3be2e67f0]
Aug 31 09:51:40 cephadm2 conmon[20474]:  6: (BlueStore::read_allocation_from_drive_on_startup()+0x127) [0x55b3be2ffa97]
Aug 31 09:51:40 cephadm2 conmon[20474]:  7: (BlueStore::_init_alloc()+0xa01) [0x55b3be300bd1]
Aug 31 09:51:40 cephadm2 conmon[20474]:  8: (BlueStore::_open_db_and_around(bool, bool)+0x2f4) [0x55b3be3487e4]
Aug 31 09:51:40 cephadm2 conmon[20474]:  9: (BlueStore::_mount()+0x1ae) [0x55b3be34b55e]
Aug 31 09:51:40 cephadm2 conmon[20474]:  10: (OSD::init()+0x3ba) [0x55b3bddec0ba]
Aug 31 09:51:40 cephadm2 conmon[20474]:  11: main()
Aug 31 09:51:40 cephadm2 conmon[20474]:  12: __libc_start_main()
Aug 31 09:51:40 cephadm2 conmon[20474]:  13: _start()
Aug 31 09:51:40 cephadm2 conmon[20474]: 
Aug 31 09:51:40 cephadm2 conmon[20474]:      0> 2021-08-31T13:51:40.907+0000 7f2d60f480c0 -1 *** Caught signal (Aborted) **
Aug 31 09:51:40 cephadm2 conmon[20474]:  in thread 7f2d60f480c0 thread_name:ceph-osd
Aug 31 09:51:40 cephadm2 conmon[20474]: 
Aug 31 09:51:40 cephadm2 conmon[20474]:  ceph version 17.0.0-7195-g7e7326c4 (7e7326c4231f614aff0f7bef4d72beadce6a9c75) quincy (dev)
Aug 31 09:51:40 cephadm2 conmon[20474]:  1: /lib64/libpthread.so.0(+0x12b20) [0x7f2d5eeeeb20]
Aug 31 09:51:40 cephadm2 conmon[20474]:  2: gsignal()
Aug 31 09:51:40 cephadm2 conmon[20474]:  3: abort()
Aug 31 09:51:40 cephadm2 conmon[20474]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x55b3bdcb0bae]
Aug 31 09:51:40 cephadm2 conmon[20474]:  5: /usr/bin/ceph-osd(+0x5ced71) [0x55b3bdcb0d71]
Aug 31 09:51:40 cephadm2 conmon[20474]:  6: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x55b3be8f93db]
Aug 31 09:51:40 cephadm2 conmon[20474]:  7: (WholeMergeIteratorImpl::next()+0x2c) [0x55b3be8f942c]
Aug 31 09:51:40 cephadm2 conmon[20474]:  8: (BlueStore::_open_collections()+0x660) [0x55b3be2e67f0]
Aug 31 09:51:40 cephadm2 conmon[20474]:  9: (BlueStore::read_allocation_from_drive_on_startup()+0x127) [0x55b3be2ffa97]
Aug 31 09:51:40 cephadm2 conmon[20474]:  10: (BlueStore::_init_alloc()+0xa01) [0x55b3be300bd1]
Aug 31 09:51:40 cephadm2 conmon[20474]:  11: (BlueStore::_open_db_and_around(bool, bool)+0x2f4) [0x55b3be3487e4]
Aug 31 09:51:40 cephadm2 conmon[20474]:  12: (BlueStore::_mount()+0x1ae) [0x55b3be34b55e]
Aug 31 09:51:40 cephadm2 conmon[20474]:  13: (OSD::init()+0x3ba) [0x55b3bddec0ba]
Aug 31 09:51:40 cephadm2 conmon[20474]:  14: main()
Aug 31 09:51:40 cephadm2 conmon[20474]:  15: __libc_start_main()
Aug 31 09:51:40 cephadm2 conmon[20474]:  16: _start()
Aug 31 09:51:40 cephadm2 conmon[20474]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Aug 31 09:51:40 cephadm2 conmon[20474]: 
Aug 31 09:51:41 cephadm2 systemd-coredump[20743]: Process 20497 (ceph-osd) of user 167 dumped core.
Aug 31 09:51:41 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Main process exited, code=exited, status=134/n/a
Aug 31 09:51:42 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Failed with result 'exit-code'.
Aug 31 09:51:52 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Service RestartSec=10s expired, scheduling restart.
Aug 31 09:51:52 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Scheduled restart job, restart counter is at 6.
Aug 31 09:51:52 cephadm2 systemd[1]: Stopped Ceph osd.0 for 1d11c63a-09ac-11ec-83e1-52540031ba78.
Aug 31 09:51:52 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Start request repeated too quickly.
Aug 31 09:51:52 cephadm2 systemd[1]: ceph-1d11c63a-09ac-11ec-83e1-52540031ba78@osd.0.service: Failed with result 'exit-code'.
Aug 31 09:51:52 cephadm2 systemd[1]: Failed to start Ceph osd.0 for 1d11c63a-09ac-11ec-83e1-52540031ba78.

The build I'm using is based on commit a49f10e760b4, with some MDS patches on top (nothing that should affect OSD).

Actions

Also available in: Atom PDF