Project

General

Profile

Actions

Bug #57602

open

ceph osd crash with `ceph_assert_fail` and `segment fault`

Added by Liang Zheng over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph osd crash with `ceph_assert_fail` and `segment fault`. Please reference the link: [[https://github.com/rook/rook/issues/10936]]

one osd crash with the following trace:
3142 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigan      tic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: In function 'bool WholeMergeIteratorImpl::is_main_smaller()' thr      ead 7fb92c9d6200 time 2022-09-08T06:52:16.279149+0000
 3143 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigan      tic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: 2343: FAILED ceph_assert(current_shard->second->valid())
 3144  ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
 3145  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x562384b7673c]
 3146  2: ceph-osd(+0x57f956) [0x562384b76956]
 3147  3: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
 3148  4: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
 3149  5: (BlueStore::_open_collections()+0x658) [0x562385188578]
 3150  6: (BlueStore::_mount()+0x226) [0x5623851e37d6]
 3151  7: (OSD::init()+0x380) [0x562384cb11d0]
 3152  8: main()
 3153  9: __libc_start_main()
 3154  10: _start()
 3155 *** Caught signal (Aborted) **
 3156  in thread 7fb92c9d6200 thread_name:ceph-osd
 3157 debug 2022-09-08T06:52:16.286+0000 7fb92c9d6200 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVA      ILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: In function '      bool WholeMergeIteratorImpl::is_main_smaller()' thread 7fb92c9d6200 time 2022-09-08T06:52:16.279149+0000
 3158 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigan      tic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/kv/RocksDBStore.cc: 2343: FAILED ceph_assert(current_shard->second->valid())
 3159 
 3160  ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
 3161  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x562384b7673c]
 3162  2: ceph-osd(+0x57f956) [0x562384b76956]
 3163  3: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
 3164  4: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
 3165  5: (BlueStore::_open_collections()+0x658) [0x562385188578]
 3166  7: (BlueStore::_mount()+0x226) [0x5623851e37d6]
 3167  7: (OSD::init()+0x380) [0x562384cb11d0]
 3168  8: main()
 3169  9: __libc_start_main()
 3170  10: _start()
 3171 
 3172  ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
 3173  1: /lib64/libpthread.so.0(+0x12ce0) [0x7fb92a975ce0]
 3174  2: gsignal()
 3175  3: abort()
 3176  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x562384b7678d]
 3177  5: ceph-osd(+0x57f956) [0x562384b76956]
 3178  6: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
 3179  7: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
 3180  8: (BlueStore::_open_collections()+0x658) [0x562385188578]
 3181  9: (BlueStore::_mount()+0x226) [0x5623851e37d6]
 3182  10: (OSD::init()+0x380) [0x562384cb11d0]
 3183  11: main()
 3184  12: __libc_start_main()
 3185  13: _start()
 3186 debug 2022-09-08T06:52:16.294+0000 7fb92c9d6200 -1 *** Caught signal (Aborted) **
 3187  in thread 7fb92c9d6200 thread_name:ceph-osd
 3188 
 3189  ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
 3190  1: /lib64/libpthread.so.0(+0x12ce0) [0x7fb92a975ce0]
 3191  2: gsignal()
 3192  3: abort()
 3193  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x562384b7678d]
 3194  5: ceph-osd(+0x57f956) [0x562384b76956]
 3195  6: (WholeMergeIteratorImpl::is_main_smaller()+0x13b) [0x56238570861b]
 3196  7: (WholeMergeIteratorImpl::next()+0x2c) [0x56238570866c]
 3197  8: (BlueStore::_open_collections()+0x658) [0x562385188578]
 3198  9: (BlueStore::_mount()+0x226) [0x5623851e37d6]
 3199  10: (OSD::init()+0x380) [0x562384cb11d0]
 3200  11: main()
 3201  12: __libc_start_main()
 3202  13: _start()
 3203  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
 3204

Another osd has down again with `segmentaion fault`

** File Read Latency Histogram By Level [P] **

*** Caught signal (Segmentation fault) **
 in thread 7f68be4e0200 thread_name:ceph-osd
 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f68bc47fce0]
 2: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x91) [0x55d4040eeca1]
 3: (OSD::read_superblock()+0x136) [0x55d403ba0636]
 4: (OSD::init()+0x863) [0x55d403bf36b3]
 5: main()
 6: __libc_start_main()
 7: _start()
debug 2022-09-08T09:12:10.223+0000 7f68be4e0200 -1 *** Caught signal (Segmentation fault) **
 in thread 7f68be4e0200 thread_name:ceph-osd

 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f68bc47fce0]
 2: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x91) [0x55d4040eeca1]
 3: (OSD::read_superblock()+0x136) [0x55d403ba0636]
 4: (OSD::init()+0x863) [0x55d403bf36b3]
 5: main()
 6: __libc_start_main()
 7: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ceph version:
16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific

Actions #1

Updated by Liang Zheng over 1 year ago

The detailed logs can be viewed from link: [[https://github.com/rook/rook/issues/10936]]

Actions

Also available in: Atom PDF