Project

General

Profile

Bug #20557

segmentation fault with rocksdb|BlueStore and jemalloc

Added by Yao Ning about 2 years ago. Updated 7 months ago.

Status:
Closed
Priority:
Low
Assignee:
-
Target version:
-
Start date:
07/10/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Jul 10 12:00:43 server-69 ceph-osd: ceph version 12.0.3 (f2337d1b42fa49dbb0a93e4048a42762e3dffbbf)
Jul 10 12:00:43 server-69 ceph-osd: 1: (()+0x9aa2bf) [0x7fe303c9b2bf]
Jul 10 12:00:43 server-69 ceph-osd: 2: (()+0xf100) [0x7fe3006fb100]
Jul 10 12:00:43 server-69 ceph-osd: 3: (()+0x1cdff) [0x7fe302eb5dff]
Jul 10 12:00:43 server-69 ceph-osd: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x7fe303f6d916]
Jul 10 12:00:43 server-69 ceph-osd: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x7fe303f6e9ac]
Jul 10 12:00:43 server-69 ceph-osd: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x7fe303f6ec07]
Jul 10 12:00:43 server-69 ceph-osd: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x7fe303f77229]
Jul 10 12:00:43 server-69 ceph-osd: 8: (()+0xca78f6) [0x7fe303f988f6]
Jul 10 12:00:43 server-69 ceph-osd: 9: (()+0xca7ebd) [0x7fe303f98ebd]
Jul 10 12:00:43 server-69 ceph-osd: 10: (()+0xca7ecf) [0x7fe303f98ecf]
Jul 10 12:00:43 server-69 ceph-osd: 11: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xce) [0x7fe303f80fbe]
Jul 10 12:00:43 server-69 ceph-osd: 12: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x174) [0x7fe304000444]
Jul 10 12:00:43 server-69 ceph-osd: 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0xa2) [0x7fe303bf4e42]
Jul 10 12:00:43 server-69 ceph-osd: 14: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::upper_bound(std::string const&, std::string const&)+0x30) [0x7fe303bf6730]
Jul 10 12:00:43 server-69 ceph-osd: 15: (BlueStore::_collection_list(BlueStore::Collection*, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0xbc2) [0x7fe303b60c42]
Jul 10 12:00:43 server-69 ceph-osd: 16: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0xe8) [0x7fe303b624c8]
Jul 10 12:00:43 server-69 ceph-osd: 17: (BlueStore::collection_list(coll_t const&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0x72) [0x7fe303b80ce2]
Jul 10 12:00:43 server-69 ceph-osd: 18: (OSD::clear_temp_objects()+0x537) [0x7fe303780357]
Jul 10 12:00:43 server-69 ceph-osd: 19: (OSD::init()+0x1f7d) [0x7fe3037b838d]
Jul 10 12:00:43 server-69 ceph-osd: 20: (main()+0x2ea7) [0x7fe3036c1547]
Jul 10 12:00:43 server-69 ceph-osd: 21: (__libc_start_main()+0xf5) [0x7fe2ff50cb15]
Jul 10 12:00:43 server-69 ceph-osd: 22: (()+0x469906) [0x7fe30375a906]
Jul 10 12:00:43 server-69 ceph-osd: 2017-07-10 12:00:43.440569 7fe3032cebc0 -1 ** Caught signal (Segmentation fault) *
Jul 10 12:00:43 server-69 ceph-osd: in thread 7fe3032cebc0 thread_name:ceph-osd

osd.6-segfault.txt View - HDD (22.4 KB) Mikko Tanner, 08/27/2017 02:05 PM

osd.13-segfault.txt View - SSD (30.3 KB) Mikko Tanner, 08/27/2017 02:05 PM

osd.14-segfault.txt View - SSD (39.8 KB) Mikko Tanner, 08/27/2017 02:05 PM


Related issues

Related to Ceph - Bug #21318: segv in rocksdb::BlockBasedTable::NewIndexIterator Duplicate 09/08/2017
Related to Ceph - Bug #21295: OSD Seg Fault on Bluestore OSD Duplicate 09/07/2017
Related to Ceph - Bug #21820: Ceph OSD crash with Segfault Duplicate 10/17/2017
Related to Ceph - Bug #21834: Filestore OSD Segfault in thread 7f084dffc700 thread_name:tp_fstore_op Duplicate 10/18/2017
Related to Ceph - Bug #21826: Filestore OSDs start segfaulting Duplicate 10/18/2017
Duplicated by Ceph - Bug #20856: osd: luminous osd bluestore crashes with jemalloc enabled on debian 9 Duplicate 07/30/2017

History

#1 Updated by Yao Ning about 2 years ago

I always got this segmentation fault when I use jemalloc. If use the default tcmalloc, it is fine now.

#2 Updated by Nathan Cutler about 2 years ago

  • Subject changed from always got segmentation fault when use BlueStore backend to always got segmentation fault when use BlueStore backend with jemalloc

#3 Updated by Sage Weil about 2 years ago

  • Status changed from New to Verified
  • Priority changed from High to Normal

not really too concerned about jemalloc. i'm guessing rocksdb is linking against tcmalloc still or something?

#4 Updated by Mikko Tanner almost 2 years ago

I am also getting Bluestore OSD segfaults with jemalloc enabled at runtime (jemalloc preload enabled in /etc/defaults/ceph) on Ubuntu 16.04, with "ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)". Please see the attached kernel logs.

Cluster has 3 OSD/MON hosts, in process of being upgraded from filestore to Bluestore OSDs. 2 other hosts with jemalloc+filestore are running without issues.

#5 Updated by Sage Weil almost 2 years ago

  • Duplicated by Bug #20856: osd: luminous osd bluestore crashes with jemalloc enabled on debian 9 added

#6 Updated by Sage Weil almost 2 years ago

I believe the problem here is that the rocksdb is more tightly bound to tcmalloc. Not exactly sure what the issues or fix is.

#7 Updated by Sage Weil almost 2 years ago

  • Related to Bug #21318: segv in rocksdb::BlockBasedTable::NewIndexIterator added

#8 Updated by Sage Weil almost 2 years ago

  • Priority changed from Normal to High

See http://tracker.ceph.com/issues/21318 ... this still happens (intermittently) on current luminous.

#9 Updated by Sage Weil almost 2 years ago

  • Related to Bug #21295: OSD Seg Fault on Bluestore OSD added

#10 Updated by Sage Weil almost 2 years ago

  • Related to Bug #21820: Ceph OSD crash with Segfault added

#11 Updated by Sage Weil almost 2 years ago

  • Subject changed from always got segmentation fault when use BlueStore backend with jemalloc to segmentation fault with BlueStore and jemalloc

#12 Updated by Sage Weil almost 2 years ago

  • Subject changed from segmentation fault with BlueStore and jemalloc to segmentation fault with rocksdb|BlueStore and jemalloc

#13 Updated by Sage Weil almost 2 years ago

  • Related to Bug #21834: Filestore OSD Segfault in thread 7f084dffc700 thread_name:tp_fstore_op added

#14 Updated by Sage Weil almost 2 years ago

  • Related to Bug #21826: Filestore OSDs start segfaulting added

#15 Updated by Sage Weil over 1 year ago

  • Project changed from Ceph to bluestore
  • Category deleted (OSD)

#16 Updated by Sage Weil over 1 year ago

  • Assignee deleted (Sage Weil)

#17 Updated by Alejandro Comisario over 1 year ago

Hi, this happens with RockDB anf filestore on luminous 12.2 on ubuntu 16.04.
Commenting /etc/default/ceph line LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 and this is still happening.

any news ?

#18 Updated by Adam Kupczyk over 1 year ago

Hi Mikko,
What architecture are you running on?
I tried to match your callstacks with binaries for x86_64 for "ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)" and it does not match. I took binaries from https://download.ceph.com/rpm-luminous/el7/x86_64/ .
Did you compile yourself?
Best regards,
Adam

#19 Updated by Mikko Tanner over 1 year ago

The arch is x68_64. Ceph was installed from eu.ceph.com deb repo. This issue isn't current for me anymore as the cluster has been upgraded to newest release in the meantime and I've given up on jemalloc. After going back to tcmalloc, the issue hasn't resurfaced.

#20 Updated by Nikola Ciprich over 1 year ago

Hi, just wanted to report I'm hitting the same issue on centos 7 with jemalloc-3.6.0-1.el7 and ceph 12.2.2

#21 Updated by Sage Weil over 1 year ago

  • Priority changed from High to Low

#22 Updated by Alex Cucu about 1 year ago

Got a similar issue on Jewel 10.2.11 as it uses rocksdb by default now. I know Jewel is near EOL and the plan was to update to latest Jewel and then upgrade to Luminous but this pus everything on hold.

I've disabled jemalloc for the OSDs that kept crashing and everything seems fine for now.

OS: CentOS 7
Ceph packages: http://download.ceph.com/rpm-jewel/el7/x86_64/
jemalloc: 3.6.0-1 from EPEL

 ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e)
 1: (()+0x9f1c2a) [0x7f9e2c4ebc2a]
 2: (()+0xf5e0) [0x7f9e29fd65e0]
 3: (()+0x34484) [0x7f9e2b6b4484]
 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xef) [0x7f9e2c3eeb0f]
 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x427) [0x7f9e2c3efc67]
 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x136) [0x7f9e2c3efee6]
 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x98) [0x7f9e2c3f8848]
 8: (()+0x92268e) [0x7f9e2c41c68e]
 9: (()+0x922cad) [0x7f9e2c41ccad]
 10: (()+0x922cbf) [0x7f9e2c41ccbf]
 11: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xde) [0x7f9e2c403d1e]
 12: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x189) [0x7f9e2c4827f9]
 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0x45) [0x7f9e2c36a1a5]
 14: (DBObjectMap::DBObjectMapIteratorImpl::lower_bound(std::string const&)+0x47) [0x7f9e2c3182c7]
 15: (FileStore::_omap_rmkeyrange(coll_t const&, ghobject_t const&, std::string const&, std::string const&, SequencerPosition const&)+0x217) [0x7f9e2c2467f7]
 16: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x10f7) [0x7f9e2c267a27]
 17: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*)+0x3b) [0x7f9e2c26d56b]
 18: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x2cd) [0x7f9e2c26d86d]
 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa7e) [0x7f9e2c5dbbae]
 20: (ThreadPool::WorkThread::entry()+0x10) [0x7f9e2c5dca90]
 21: (()+0x7e25) [0x7f9e29fcee25]
 22: (clone()+0x6d) [0x7f9e2865934d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#23 Updated by Neha Ojha 7 months ago

  • Status changed from Verified to Closed

Also available in: Atom PDF