Project

General

Profile

Actions

Bug #20557

closed

segmentation fault with rocksdb|BlueStore and jemalloc

Added by Yao Ning almost 7 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Low
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Jul 10 12:00:43 server-69 ceph-osd: ceph version 12.0.3 (f2337d1b42fa49dbb0a93e4048a42762e3dffbbf)
Jul 10 12:00:43 server-69 ceph-osd: 1: (()+0x9aa2bf) [0x7fe303c9b2bf]
Jul 10 12:00:43 server-69 ceph-osd: 2: (()+0xf100) [0x7fe3006fb100]
Jul 10 12:00:43 server-69 ceph-osd: 3: (()+0x1cdff) [0x7fe302eb5dff]
Jul 10 12:00:43 server-69 ceph-osd: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xd6) [0x7fe303f6d916]
Jul 10 12:00:43 server-69 ceph-osd: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3dc) [0x7fe303f6e9ac]
Jul 10 12:00:43 server-69 ceph-osd: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x127) [0x7fe303f6ec07]
Jul 10 12:00:43 server-69 ceph-osd: 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x89) [0x7fe303f77229]
Jul 10 12:00:43 server-69 ceph-osd: 8: (()+0xca78f6) [0x7fe303f988f6]
Jul 10 12:00:43 server-69 ceph-osd: 9: (()+0xca7ebd) [0x7fe303f98ebd]
Jul 10 12:00:43 server-69 ceph-osd: 10: (()+0xca7ecf) [0x7fe303f98ecf]
Jul 10 12:00:43 server-69 ceph-osd: 11: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xce) [0x7fe303f80fbe]
Jul 10 12:00:43 server-69 ceph-osd: 12: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x174) [0x7fe304000444]
Jul 10 12:00:43 server-69 ceph-osd: 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0xa2) [0x7fe303bf4e42]
Jul 10 12:00:43 server-69 ceph-osd: 14: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::upper_bound(std::string const&, std::string const&)+0x30) [0x7fe303bf6730]
Jul 10 12:00:43 server-69 ceph-osd: 15: (BlueStore::_collection_list(BlueStore::Collection*, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0xbc2) [0x7fe303b60c42]
Jul 10 12:00:43 server-69 ceph-osd: 16: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0xe8) [0x7fe303b624c8]
Jul 10 12:00:43 server-69 ceph-osd: 17: (BlueStore::collection_list(coll_t const&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0x72) [0x7fe303b80ce2]
Jul 10 12:00:43 server-69 ceph-osd: 18: (OSD::clear_temp_objects()+0x537) [0x7fe303780357]
Jul 10 12:00:43 server-69 ceph-osd: 19: (OSD::init()+0x1f7d) [0x7fe3037b838d]
Jul 10 12:00:43 server-69 ceph-osd: 20: (main()+0x2ea7) [0x7fe3036c1547]
Jul 10 12:00:43 server-69 ceph-osd: 21: (__libc_start_main()+0xf5) [0x7fe2ff50cb15]
Jul 10 12:00:43 server-69 ceph-osd: 22: (()+0x469906) [0x7fe30375a906]
Jul 10 12:00:43 server-69 ceph-osd: 2017-07-10 12:00:43.440569 7fe3032cebc0 -1 ** Caught signal (Segmentation fault) *
Jul 10 12:00:43 server-69 ceph-osd: in thread 7fe3032cebc0 thread_name:ceph-osd


Files

osd.6-segfault.txt (22.4 KB) osd.6-segfault.txt HDD Mikko Tanner, 08/27/2017 02:05 PM
osd.13-segfault.txt (30.3 KB) osd.13-segfault.txt SSD Mikko Tanner, 08/27/2017 02:05 PM
osd.14-segfault.txt (39.8 KB) osd.14-segfault.txt SSD Mikko Tanner, 08/27/2017 02:05 PM

Related issues 6 (0 open6 closed)

Related to Ceph - Bug #21318: segv in rocksdb::BlockBasedTable::NewIndexIteratorDuplicate09/08/2017

Actions
Related to Ceph - Bug #21295: OSD Seg Fault on Bluestore OSDDuplicate09/07/2017

Actions
Related to Ceph - Bug #21820: Ceph OSD crash with SegfaultDuplicate10/17/2017

Actions
Related to Ceph - Bug #21834: Filestore OSD Segfault in thread 7f084dffc700 thread_name:tp_fstore_opDuplicate10/18/2017

Actions
Related to Ceph - Bug #21826: Filestore OSDs start segfaultingDuplicate10/18/2017

Actions
Has duplicate Ceph - Bug #20856: osd: luminous osd bluestore crashes with jemalloc enabled on debian 9Duplicate07/30/2017

Actions
Actions #1

Updated by Yao Ning almost 7 years ago

I always got this segmentation fault when I use jemalloc. If use the default tcmalloc, it is fine now.

Actions #2

Updated by Nathan Cutler almost 7 years ago

  • Subject changed from always got segmentation fault when use BlueStore backend to always got segmentation fault when use BlueStore backend with jemalloc
Actions #3

Updated by Sage Weil almost 7 years ago

  • Status changed from New to 12
  • Priority changed from High to Normal

not really too concerned about jemalloc. i'm guessing rocksdb is linking against tcmalloc still or something?

Updated by Mikko Tanner over 6 years ago

I am also getting Bluestore OSD segfaults with jemalloc enabled at runtime (jemalloc preload enabled in /etc/defaults/ceph) on Ubuntu 16.04, with "ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)". Please see the attached kernel logs.

Cluster has 3 OSD/MON hosts, in process of being upgraded from filestore to Bluestore OSDs. 2 other hosts with jemalloc+filestore are running without issues.

Actions #5

Updated by Sage Weil over 6 years ago

  • Has duplicate Bug #20856: osd: luminous osd bluestore crashes with jemalloc enabled on debian 9 added
Actions #6

Updated by Sage Weil over 6 years ago

I believe the problem here is that the rocksdb is more tightly bound to tcmalloc. Not exactly sure what the issues or fix is.

Actions #7

Updated by Sage Weil over 6 years ago

  • Related to Bug #21318: segv in rocksdb::BlockBasedTable::NewIndexIterator added
Actions #8

Updated by Sage Weil over 6 years ago

  • Priority changed from Normal to High

See http://tracker.ceph.com/issues/21318 ... this still happens (intermittently) on current luminous.

Actions #9

Updated by Sage Weil over 6 years ago

  • Related to Bug #21295: OSD Seg Fault on Bluestore OSD added
Actions #10

Updated by Sage Weil over 6 years ago

  • Related to Bug #21820: Ceph OSD crash with Segfault added
Actions #11

Updated by Sage Weil over 6 years ago

  • Subject changed from always got segmentation fault when use BlueStore backend with jemalloc to segmentation fault with BlueStore and jemalloc
Actions #12

Updated by Sage Weil over 6 years ago

  • Subject changed from segmentation fault with BlueStore and jemalloc to segmentation fault with rocksdb|BlueStore and jemalloc
Actions #13

Updated by Sage Weil over 6 years ago

  • Related to Bug #21834: Filestore OSD Segfault in thread 7f084dffc700 thread_name:tp_fstore_op added
Actions #14

Updated by Sage Weil over 6 years ago

  • Related to Bug #21826: Filestore OSDs start segfaulting added
Actions #15

Updated by Sage Weil over 6 years ago

  • Project changed from Ceph to bluestore
  • Category deleted (OSD)
Actions #16

Updated by Sage Weil over 6 years ago

  • Assignee deleted (Sage Weil)
Actions #17

Updated by Alejandro Comisario over 6 years ago

Hi, this happens with RockDB anf filestore on luminous 12.2 on ubuntu 16.04.
Commenting /etc/default/ceph line LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 and this is still happening.

any news ?

Actions #18

Updated by Adam Kupczyk over 6 years ago

Hi Mikko,
What architecture are you running on?
I tried to match your callstacks with binaries for x86_64 for "ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)" and it does not match. I took binaries from https://download.ceph.com/rpm-luminous/el7/x86_64/ .
Did you compile yourself?
Best regards,
Adam

Actions #19

Updated by Mikko Tanner over 6 years ago

The arch is x68_64. Ceph was installed from eu.ceph.com deb repo. This issue isn't current for me anymore as the cluster has been upgraded to newest release in the meantime and I've given up on jemalloc. After going back to tcmalloc, the issue hasn't resurfaced.

Actions #20

Updated by Nikola Ciprich about 6 years ago

Hi, just wanted to report I'm hitting the same issue on centos 7 with jemalloc-3.6.0-1.el7 and ceph 12.2.2

Actions #21

Updated by Sage Weil about 6 years ago

  • Priority changed from High to Low
Actions #22

Updated by Alex Cucu over 5 years ago

Got a similar issue on Jewel 10.2.11 as it uses rocksdb by default now. I know Jewel is near EOL and the plan was to update to latest Jewel and then upgrade to Luminous but this pus everything on hold.

I've disabled jemalloc for the OSDs that kept crashing and everything seems fine for now.

OS: CentOS 7
Ceph packages: http://download.ceph.com/rpm-jewel/el7/x86_64/
jemalloc: 3.6.0-1 from EPEL

 ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e)
 1: (()+0x9f1c2a) [0x7f9e2c4ebc2a]
 2: (()+0xf5e0) [0x7f9e29fd65e0]
 3: (()+0x34484) [0x7f9e2b6b4484]
 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0xef) [0x7f9e2c3eeb0f]
 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x427) [0x7f9e2c3efc67]
 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x136) [0x7f9e2c3efee6]
 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x98) [0x7f9e2c3f8848]
 8: (()+0x92268e) [0x7f9e2c41c68e]
 9: (()+0x922cad) [0x7f9e2c41ccad]
 10: (()+0x922cbf) [0x7f9e2c41ccbf]
 11: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0xde) [0x7f9e2c403d1e]
 12: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x189) [0x7f9e2c4827f9]
 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0x45) [0x7f9e2c36a1a5]
 14: (DBObjectMap::DBObjectMapIteratorImpl::lower_bound(std::string const&)+0x47) [0x7f9e2c3182c7]
 15: (FileStore::_omap_rmkeyrange(coll_t const&, ghobject_t const&, std::string const&, std::string const&, SequencerPosition const&)+0x217) [0x7f9e2c2467f7]
 16: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x10f7) [0x7f9e2c267a27]
 17: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*)+0x3b) [0x7f9e2c26d56b]
 18: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x2cd) [0x7f9e2c26d86d]
 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa7e) [0x7f9e2c5dbbae]
 20: (ThreadPool::WorkThread::entry()+0x10) [0x7f9e2c5dca90]
 21: (()+0x7e25) [0x7f9e29fcee25]
 22: (clone()+0x6d) [0x7f9e2865934d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #23

Updated by Neha Ojha about 5 years ago

  • Status changed from 12 to Closed
Actions

Also available in: Atom PDF