Project

General

Profile

Bug #36567

Segmentation fault in BlueStore::Blob::discard_unallocated

Added by Stefan Priebe 10 months ago. Updated 9 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
Start date:
10/23/2018
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Hello,

i'm observing regular crashes / segmentation faults of bluestore OSDs in ceph 12.2.8.

Trace as follows:
0> 2018-10-23 13:29:16.438969 7fdc6b7ff700 -1 ** Caught signal (Segmentation fault) *
in thread 7fdc6b7ff700 thread_name:tp_osd_disk
ceph version 12.2.8-10-ga49f886acf (a49f886acf37d7254fc404807e4d26ffe16d3096) luminous (stable)
1: (()+0xa3b144) [0x564a52f0c144]
2: (()+0x110c0) [0x7fdc997510c0]
3: (pthread_mutex_lock()+0) [0x7fdc99749b20]
4: (BlueStore::Blob::discard_unallocated(BlueStore::Collection*)+0x349) [0x564a52da78b9]
5: (BlueStore::_wctx_finish(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*, std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >)+0x5d7) [0x564a52dd7ff7]
6: (BlueStore::_do_truncate(BlueStore::TransContext
, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >)+0x2e2) [0x564a52df2802]
7: (BlueStore::_do_remove(BlueStore::TransContext
, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xc5) [0x564a52df3055]
8: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7b) [0x564a52df4b7b]
9: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1f55) [0x564a52e02475]
10: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546) [0x564a52e03186]
11: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f) [0x564a52994def]
12: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::shared_ptr<DeletingState>, bool*, ThreadPool::TPHandle&)+0xbe0) [0x564a5291ec20]
13: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x1cc) [0x564a5291f46c]
14: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0x122) [0x564a5298b562]
15: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x564a52f5b1f8]
16: (ThreadPool::WorkThread::entry()+0x10) [0x564a52f5c390]
17: (()+0x7494) [0x7fdc99747494]
18: (clone()+0x3f) [0x7fdc987ceacf]


Related issues

Related to bluestore - Bug #36526: segv in BlueStore::OldExtent::create Resolved 10/18/2018

History

#1 Updated by Stefan Priebe 10 months ago

But I'm seeing also those:

    0> 2018-10-22 15:40:34.772349 7f730dbfe700 -1 *** Caught signal (Segmentation fault) **
in thread 7f730dbfe700 thread_name:tp_osd_disk
ceph version 12.2.8-10-ga49f886acf (a49f886acf37d7254fc404807e4d26ffe16d3096) luminous (stable)
1: (()+0xa3b144) [0x560c1153e144]
2: (()+0x110c0) [0x7f737cd510c0]
3: (BlueStore::OldExtent::create(boost::intrusive_ptr<BlueStore::Collection>, unsigned int, unsigned int, unsigned int, boost::intrusive_ptr<BlueStore::Blob>&)+0x9a) [0x560c113e374a]
4: (BlueStore::ExtentMap::punch_hole(boost::intrusive_ptr<BlueStore::Collection>&, unsigned long, unsigned long, boost::intrusive::list<BlueStore::OldExtent, boost::intrusive::member_hook<BlueStore::OldExtent, boost::intrusive::list_member_hook<void, void, void>, &BlueStore::OldExtent::old_extent_item>, void, void, void>*)+0x211) [0x560c1141d681]
5: (BlueStore::_do_truncate(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >*)+0x2a0) [0x560c114247c0]
6: (BlueStore::_do_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xc5) [0x560c11425055]
7: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7b) [0x560c11426b7b]
8: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1f55) [0x560c11434475]
9: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546) [0x560c11435186]
10: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f) [0x560c10fc6def]
11: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::shared_ptr<DeletingState>, bool*, ThreadPool::TPHandle&)+0x567) [0x560c10f505a7]
12: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x1cc) [0x560c10f5146c]
13: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0x122) [0x560c10fbd562]
14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x560c1158d1f8]
15: (ThreadPool::WorkThread::entry()+0x10) [0x560c1158e390]
16: (()+0x7494) [0x7f737cd47494]
17: (clone()+0x3f) [0x7f737bdceacf]

in thread 7fe09aff8700 thread_name:bstore_kv_final
ceph version 12.2.8-10-ga49f886acf (a49f886acf37d7254fc404807e4d26ffe16d3096) luminous (stable)
1: (()+0xa3b144) [0x560ebb7c1144]
2: (()+0x110c0) [0x7fe0ad5b20c0]
3: (()+0xe159) [0x7fe0af04b159]
4: (()+0x298c6) [0x7fe0af0668c6]
5: (free()+0x359) [0x7fe0af0414e9]
6: (std::__detail::_Hashtable_alloc<mempool::pool_allocator<(mempool::pool_index_t)4, std::__detail::_Hash_node<std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, true> > >::_M_deallocate_node(std::__detail::_Hash_node<std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, true>*)+0x6b) [0x560ebb6ce5db]
7: (std::_Hashtable<ghobject_t, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, mempool::pool_allocator<(mempool::pool_index_t)4, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> > >, std::__detail::_Select1st, std::equal_to<ghobject_t>, std::hash<ghobject_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::clear()+0x23) [0x560ebb6ce663]
8: (BlueStore::OnodeSpace::clear()+0x1b2) [0x560ebb667542]
9: (BlueStore::_reap_collections()+0x383) [0x560ebb667973]
10: (BlueStore::_kv_finalize_thread()+0xbc9) [0x560ebb6774a9]
11: (BlueStore::KVFinalizeThread::entry()+0xd) [0x560ebb6d18ad]
12: (()+0x7494) [0x7fe0ad5a8494]
13: (clone()+0x3f) [0x7fe0ac62facf]

All from different servers and different OSDs.

#2 Updated by Igor Fedotov 10 months ago

The second log is similar to
http://tracker.ceph.com/issues/36526

#3 Updated by Stefan Priebe 10 months ago

Yes so my question is if all of those are may be just a result of the race mentioned here: https://github.com/ceph/ceph/pull/24701

#4 Updated by Sage Weil 10 months ago

Stefan Priebe wrote:

Yes so my question is if all of those are may be just a result of the race mentioned here: https://github.com/ceph/ceph/pull/24701

It could be. We'll get those 2 recent fixes backported shortly. How reproducible is this? Can we do a test branch/build on top of 12.2.9 for you to test?

#5 Updated by Sage Weil 10 months ago

  • Related to Bug #36526: segv in BlueStore::OldExtent::create added

#6 Updated by Stefan Priebe 10 months ago

Not that good ;-) it always happen, when we trigger a heavy backfill or recovery. But i don't want to pull that many disks ;-) We only had this while replacing old ssds with new ones.

#7 Updated by Sage Weil 9 months ago

  • Status changed from New to Duplicate

Also available in: Atom PDF