Bug #20557
closedsegmentation fault with rocksdb|BlueStore and jemalloc
0%
Description
Jul 10 12:00:43 server-69 ceph-osd: ceph version 12.0.3 (f2337d1b42fa49dbb0a93e4048a42762e3dffbbf)
Jul 10 12:00:43 server-69 ceph-osd: 1: (()+0x9aa2bf) [0x7fe303c9b2bf]
Jul 10 12:00:43 server-69 ceph-osd: 2: (()+0xf100) [0x7fe3006fb100]
Jul 10 12:00:43 server-69 ceph-osd: 3: (()+0x1cdff) [0x7fe302eb5dff]
Jul 10 12:00:43 server-69 ceph-osd: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x7fe303f6d916]
Jul 10 12:00:43 server-69 ceph-osd: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x7fe303f6e9ac]
Jul 10 12:00:43 server-69 ceph-osd: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x7fe303f6ec07]
Jul 10 12:00:43 server-69 ceph-osd: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x7fe303f77229]
Jul 10 12:00:43 server-69 ceph-osd: 8: (()+0xca78f6) [0x7fe303f988f6]
Jul 10 12:00:43 server-69 ceph-osd: 9: (()+0xca7ebd) [0x7fe303f98ebd]
Jul 10 12:00:43 server-69 ceph-osd: 10: (()+0xca7ecf) [0x7fe303f98ecf]
Jul 10 12:00:43 server-69 ceph-osd: 11: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xce) [0x7fe303f80fbe]
Jul 10 12:00:43 server-69 ceph-osd: 12: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x174) [0x7fe304000444]
Jul 10 12:00:43 server-69 ceph-osd: 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0xa2) [0x7fe303bf4e42]
Jul 10 12:00:43 server-69 ceph-osd: 14: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::upper_bound(std::string const&, std::string const&)+0x30) [0x7fe303bf6730]
Jul 10 12:00:43 server-69 ceph-osd: 15: (BlueStore::_collection_list(BlueStore::Collection*, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0xbc2) [0x7fe303b60c42]
Jul 10 12:00:43 server-69 ceph-osd: 16: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0xe8) [0x7fe303b624c8]
Jul 10 12:00:43 server-69 ceph-osd: 17: (BlueStore::collection_list(coll_t const&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0x72) [0x7fe303b80ce2]
Jul 10 12:00:43 server-69 ceph-osd: 18: (OSD::clear_temp_objects()+0x537) [0x7fe303780357]
Jul 10 12:00:43 server-69 ceph-osd: 19: (OSD::init()+0x1f7d) [0x7fe3037b838d]
Jul 10 12:00:43 server-69 ceph-osd: 20: (main()+0x2ea7) [0x7fe3036c1547]
Jul 10 12:00:43 server-69 ceph-osd: 21: (__libc_start_main()+0xf5) [0x7fe2ff50cb15]
Jul 10 12:00:43 server-69 ceph-osd: 22: (()+0x469906) [0x7fe30375a906]
Jul 10 12:00:43 server-69 ceph-osd: 2017-07-10 12:00:43.440569 7fe3032cebc0 -1 ** Caught signal (Segmentation fault) *
Jul 10 12:00:43 server-69 ceph-osd: in thread 7fe3032cebc0 thread_name:ceph-osd
Files
Updated by Yao Ning almost 7 years ago
I always got this segmentation fault when I use jemalloc. If use the default tcmalloc, it is fine now.
Updated by Nathan Cutler almost 7 years ago
- Subject changed from always got segmentation fault when use BlueStore backend to always got segmentation fault when use BlueStore backend with jemalloc
Updated by Sage Weil almost 7 years ago
- Status changed from New to 12
- Priority changed from High to Normal
not really too concerned about jemalloc. i'm guessing rocksdb is linking against tcmalloc still or something?
Updated by Mikko Tanner over 6 years ago
- File osd.6-segfault.txt osd.6-segfault.txt added
- File osd.13-segfault.txt osd.13-segfault.txt added
- File osd.14-segfault.txt osd.14-segfault.txt added
I am also getting Bluestore OSD segfaults with jemalloc enabled at runtime (jemalloc preload enabled in /etc/defaults/ceph) on Ubuntu 16.04, with "ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)". Please see the attached kernel logs.
Cluster has 3 OSD/MON hosts, in process of being upgraded from filestore to Bluestore OSDs. 2 other hosts with jemalloc+filestore are running without issues.
Updated by Sage Weil over 6 years ago
- Has duplicate Bug #20856: osd: luminous osd bluestore crashes with jemalloc enabled on debian 9 added
Updated by Sage Weil over 6 years ago
I believe the problem here is that the rocksdb is more tightly bound to tcmalloc. Not exactly sure what the issues or fix is.
Updated by Sage Weil over 6 years ago
- Related to Bug #21318: segv in rocksdb::BlockBasedTable::NewIndexIterator added
Updated by Sage Weil over 6 years ago
- Priority changed from Normal to High
See http://tracker.ceph.com/issues/21318 ... this still happens (intermittently) on current luminous.
Updated by Sage Weil over 6 years ago
- Related to Bug #21295: OSD Seg Fault on Bluestore OSD added
Updated by Sage Weil over 6 years ago
- Related to Bug #21820: Ceph OSD crash with Segfault added
Updated by Sage Weil over 6 years ago
- Subject changed from always got segmentation fault when use BlueStore backend with jemalloc to segmentation fault with BlueStore and jemalloc
Updated by Sage Weil over 6 years ago
- Subject changed from segmentation fault with BlueStore and jemalloc to segmentation fault with rocksdb|BlueStore and jemalloc
Updated by Sage Weil over 6 years ago
- Related to Bug #21834: Filestore OSD Segfault in thread 7f084dffc700 thread_name:tp_fstore_op added
Updated by Sage Weil over 6 years ago
- Related to Bug #21826: Filestore OSDs start segfaulting added
Updated by Sage Weil over 6 years ago
- Project changed from Ceph to bluestore
- Category deleted (
OSD)
Updated by Alejandro Comisario over 6 years ago
Hi, this happens with RockDB anf filestore on luminous 12.2 on ubuntu 16.04.
Commenting /etc/default/ceph line LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 and this is still happening.
any news ?
Updated by Adam Kupczyk over 6 years ago
Hi Mikko,
What architecture are you running on?
I tried to match your callstacks with binaries for x86_64 for "ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)" and it does not match. I took binaries from https://download.ceph.com/rpm-luminous/el7/x86_64/ .
Did you compile yourself?
Best regards,
Adam
Updated by Mikko Tanner over 6 years ago
The arch is x68_64. Ceph was installed from eu.ceph.com deb repo. This issue isn't current for me anymore as the cluster has been upgraded to newest release in the meantime and I've given up on jemalloc. After going back to tcmalloc, the issue hasn't resurfaced.
Updated by Nikola Ciprich about 6 years ago
Hi, just wanted to report I'm hitting the same issue on centos 7 with jemalloc-3.6.0-1.el7 and ceph 12.2.2
Updated by Alex Cucu over 5 years ago
Got a similar issue on Jewel 10.2.11 as it uses rocksdb by default now. I know Jewel is near EOL and the plan was to update to latest Jewel and then upgrade to Luminous but this pus everything on hold.
I've disabled jemalloc for the OSDs that kept crashing and everything seems fine for now.
OS: CentOS 7
Ceph packages: http://download.ceph.com/rpm-jewel/el7/x86_64/
jemalloc: 3.6.0-1 from EPEL
ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e) 1: (()+0x9f1c2a) [0x7f9e2c4ebc2a] 2: (()+0xf5e0) [0x7f9e29fd65e0] 3: (()+0x34484) [0x7f9e2b6b4484] 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xef) [0x7f9e2c3eeb0f] 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x427) [0x7f9e2c3efc67] 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x136) [0x7f9e2c3efee6] 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x98) [0x7f9e2c3f8848] 8: (()+0x92268e) [0x7f9e2c41c68e] 9: (()+0x922cad) [0x7f9e2c41ccad] 10: (()+0x922cbf) [0x7f9e2c41ccbf] 11: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xde) [0x7f9e2c403d1e] 12: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x189) [0x7f9e2c4827f9] 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0x45) [0x7f9e2c36a1a5] 14: (DBObjectMap::DBObjectMapIteratorImpl::lower_bound(std::string const&)+0x47) [0x7f9e2c3182c7] 15: (FileStore::_omap_rmkeyrange(coll_t const&, ghobject_t const&, std::string const&, std::string const&, SequencerPosition const&)+0x217) [0x7f9e2c2467f7] 16: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x10f7) [0x7f9e2c267a27] 17: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*)+0x3b) [0x7f9e2c26d56b] 18: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x2cd) [0x7f9e2c26d86d] 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa7e) [0x7f9e2c5dbbae] 20: (ThreadPool::WorkThread::entry()+0x10) [0x7f9e2c5dca90] 21: (()+0x7e25) [0x7f9e29fcee25] 22: (clone()+0x6d) [0x7f9e2865934d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.