Actions
Bug #57602
openceph osd crash with `ceph_assert_fail` and `segment fault`
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
ceph osd crash with `ceph_assert_fail` and `segment fault`. Please reference the link: [[https://github.com/rook/rook/issues/10936]]
one osd crash with the following trace:
3142 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigan tic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: In function 'bool WholeMergeIteratorImpl::is_main_smaller()' thr ead 7fb92c9d6200 time 2022-09-08T06:52:16.279149+0000
3143 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigan tic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: 2343: FAILED ceph_assert(current_shard->second->valid())
3144 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
3145 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x562384b7673c]
3146 2: ceph-osd(+0x57f956) [0x562384b76956]
3147 3: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
3148 4: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
3149 5: (BlueStore::_open_collections()+0x658) [0x562385188578]
3150 6: (BlueStore::_mount()+0x226) [0x5623851e37d6]
3151 7: (OSD::init()+0x380) [0x562384cb11d0]
3152 8: main()
3153 9: __libc_start_main()
3154 10: _start()
3155 *** Caught signal (Aborted) **
3156 in thread 7fb92c9d6200 thread_name:ceph-osd
3157 debug 2022-09-08T06:52:16.286+0000 7fb92c9d6200 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVA ILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: In function ' bool WholeMergeIteratorImpl::is_main_smaller()' thread 7fb92c9d6200 time 2022-09-08T06:52:16.279149+0000
3158 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigan tic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: 2343: FAILED ceph_assert(current_shard->second->valid())
3159
3160 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
3161 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x562384b7673c]
3162 2: ceph-osd(+0x57f956) [0x562384b76956]
3163 3: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
3164 4: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
3165 5: (BlueStore::_open_collections()+0x658) [0x562385188578]
3166 7: (BlueStore::_mount()+0x226) [0x5623851e37d6]
3167 7: (OSD::init()+0x380) [0x562384cb11d0]
3168 8: main()
3169 9: __libc_start_main()
3170 10: _start()
3171
3172 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
3173 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fb92a975ce0]
3174 2: gsignal()
3175 3: abort()
3176 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x562384b7678d]
3177 5: ceph-osd(+0x57f956) [0x562384b76956]
3178 6: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
3179 7: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
3180 8: (BlueStore::_open_collections()+0x658) [0x562385188578]
3181 9: (BlueStore::_mount()+0x226) [0x5623851e37d6]
3182 10: (OSD::init()+0x380) [0x562384cb11d0]
3183 11: main()
3184 12: __libc_start_main()
3185 13: _start()
3186 debug 2022-09-08T06:52:16.294+0000 7fb92c9d6200 -1 *** Caught signal (Aborted) **
3187 in thread 7fb92c9d6200 thread_name:ceph-osd
3188
3189 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
3190 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fb92a975ce0]
3191 2: gsignal()
3192 3: abort()
3193 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x562384b7678d]
3194 5: ceph-osd(+0x57f956) [0x562384b76956]
3195 6: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
3196 7: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
3197 8: (BlueStore::_open_collections()+0x658) [0x562385188578]
3198 9: (BlueStore::_mount()+0x226) [0x5623851e37d6]
3199 10: (OSD::init()+0x380) [0x562384cb11d0]
3200 11: main()
3201 12: __libc_start_main()
3202 13: _start()
3203 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
3204
3142 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigan tic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: In function 'bool WholeMergeIteratorImpl::is_main_smaller()' thr ead 7fb92c9d6200 time 2022-09-08T06:52:16.279149+0000
3143 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigan tic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: 2343: FAILED ceph_assert(current_shard->second->valid())
3144 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
3145 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x562384b7673c]
3146 2: ceph-osd(+0x57f956) [0x562384b76956]
3147 3: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
3148 4: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
3149 5: (BlueStore::_open_collections()+0x658) [0x562385188578]
3150 6: (BlueStore::_mount()+0x226) [0x5623851e37d6]
3151 7: (OSD::init()+0x380) [0x562384cb11d0]
3152 8: main()
3153 9: __libc_start_main()
3154 10: _start()
3155 *** Caught signal (Aborted) **
3156 in thread 7fb92c9d6200 thread_name:ceph-osd
3157 debug 2022-09-08T06:52:16.286+0000 7fb92c9d6200 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVA ILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: In function ' bool WholeMergeIteratorImpl::is_main_smaller()' thread 7fb92c9d6200 time 2022-09-08T06:52:16.279149+0000
3158 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigan tic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: 2343: FAILED ceph_assert(current_shard->second->valid())
3159
3160 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
3161 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x562384b7673c]
3162 2: ceph-osd(+0x57f956) [0x562384b76956]
3163 3: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
3164 4: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
3165 5: (BlueStore::_open_collections()+0x658) [0x562385188578]
3166 7: (BlueStore::_mount()+0x226) [0x5623851e37d6]
3167 7: (OSD::init()+0x380) [0x562384cb11d0]
3168 8: main()
3169 9: __libc_start_main()
3170 10: _start()
3171
3172 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
3173 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fb92a975ce0]
3174 2: gsignal()
3175 3: abort()
3176 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x562384b7678d]
3177 5: ceph-osd(+0x57f956) [0x562384b76956]
3178 6: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
3179 7: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
3180 8: (BlueStore::_open_collections()+0x658) [0x562385188578]
3181 9: (BlueStore::_mount()+0x226) [0x5623851e37d6]
3182 10: (OSD::init()+0x380) [0x562384cb11d0]
3183 11: main()
3184 12: __libc_start_main()
3185 13: _start()
3186 debug 2022-09-08T06:52:16.294+0000 7fb92c9d6200 -1 *** Caught signal (Aborted) **
3187 in thread 7fb92c9d6200 thread_name:ceph-osd
3188
3189 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
3190 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fb92a975ce0]
3191 2: gsignal()
3192 3: abort()
3193 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x562384b7678d]
3194 5: ceph-osd(+0x57f956) [0x562384b76956]
3195 6: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
3196 7: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
3197 8: (BlueStore::_open_collections()+0x658) [0x562385188578]
3198 9: (BlueStore::_mount()+0x226) [0x5623851e37d6]
3199 10: (OSD::init()+0x380) [0x562384cb11d0]
3200 11: main()
3201 12: __libc_start_main()
3202 13: _start()
3203 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
3204
Another osd has down again with `segmentaion fault`¶
** File Read Latency Histogram By Level [P] **
*** Caught signal (Segmentation fault) **
in thread 7f68be4e0200 thread_name:ceph-osd
ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
1: /lib64/libpthread.so.0(+0x12ce0) [0x7f68bc47fce0]
2: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x91) [0x55d4040eeca1]
3: (OSD::read_superblock()+0x136) [0x55d403ba0636]
4: (OSD::init()+0x863) [0x55d403bf36b3]
5: main()
6: __libc_start_main()
7: _start()
debug 2022-09-08T09:12:10.223+0000 7f68be4e0200 -1 *** Caught signal (Segmentation fault) **
in thread 7f68be4e0200 thread_name:ceph-osd
ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
1: /lib64/libpthread.so.0(+0x12ce0) [0x7f68bc47fce0]
2: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x91) [0x55d4040eeca1]
3: (OSD::read_superblock()+0x136) [0x55d403ba0636]
4: (OSD::init()+0x863) [0x55d403bf36b3]
5: main()
6: __libc_start_main()
7: _start()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ceph version:
16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific
Updated by Liang Zheng over 1 year ago
The detailed logs can be viewed from link: [[https://github.com/rook/rook/issues/10936]]
Actions