Bug #36567
Segmentation fault in BlueStore::Blob::discard_unallocated
0%
Description
Hello,
i'm observing regular crashes / segmentation faults of bluestore OSDs in ceph 12.2.8.
Trace as follows:
0> 2018-10-23 13:29:16.438969 7fdc6b7ff700 -1 ** Caught signal (Segmentation fault) *
in thread 7fdc6b7ff700 thread_name:tp_osd_disk
ceph version 12.2.8-10-ga49f886acf (a49f886acf37d7254fc404807e4d26ffe16d3096) luminous (stable)
1: (()+0xa3b144) [0x564a52f0c144]
2: (()+0x110c0) [0x7fdc997510c0]
3: (pthread_mutex_lock()+0) [0x7fdc99749b20]
4: (BlueStore::Blob::discard_unallocated(BlueStore::Collection*)+0x349) [0x564a52da78b9]
5: (BlueStore::_wctx_finish(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*, std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >)+0x5d7) [0x564a52dd7ff7]
6: (BlueStore::_do_truncate(BlueStore::TransContext, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >)+0x2e2) [0x564a52df2802]
7: (BlueStore::_do_remove(BlueStore::TransContext, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xc5) [0x564a52df3055]
8: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7b) [0x564a52df4b7b]
9: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1f55) [0x564a52e02475]
10: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546) [0x564a52e03186]
11: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f) [0x564a52994def]
12: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::shared_ptr<DeletingState>, bool*, ThreadPool::TPHandle&)+0xbe0) [0x564a5291ec20]
13: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x1cc) [0x564a5291f46c]
14: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0x122) [0x564a5298b562]
15: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x564a52f5b1f8]
16: (ThreadPool::WorkThread::entry()+0x10) [0x564a52f5c390]
17: (()+0x7494) [0x7fdc99747494]
18: (clone()+0x3f) [0x7fdc987ceacf]
Related issues
History
#1 Updated by Stefan Priebe about 5 years ago
But I'm seeing also those:
0> 2018-10-22 15:40:34.772349 7f730dbfe700 -1 *** Caught signal (Segmentation fault) ** in thread 7f730dbfe700 thread_name:tp_osd_disk ceph version 12.2.8-10-ga49f886acf (a49f886acf37d7254fc404807e4d26ffe16d3096) luminous (stable) 1: (()+0xa3b144) [0x560c1153e144] 2: (()+0x110c0) [0x7f737cd510c0] 3: (BlueStore::OldExtent::create(boost::intrusive_ptr<BlueStore::Collection>, unsigned int, unsigned int, unsigned int, boost::intrusive_ptr<BlueStore::Blob>&)+0x9a) [0x560c113e374a] 4: (BlueStore::ExtentMap::punch_hole(boost::intrusive_ptr<BlueStore::Collection>&, unsigned long, unsigned long, boost::intrusive::list<BlueStore::OldExtent, boost::intrusive::member_hook<BlueStore::OldExtent, boost::intrusive::list_member_hook<void, void, void>, &BlueStore::OldExtent::old_extent_item>, void, void, void>*)+0x211) [0x560c1141d681] 5: (BlueStore::_do_truncate(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >*)+0x2a0) [0x560c114247c0] 6: (BlueStore::_do_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xc5) [0x560c11425055] 7: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7b) [0x560c11426b7b] 8: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1f55) [0x560c11434475] 9: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546) [0x560c11435186] 10: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f) [0x560c10fc6def] 11: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::shared_ptr<DeletingState>, bool*, ThreadPool::TPHandle&)+0x567) [0x560c10f505a7] 12: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x1cc) [0x560c10f5146c] 13: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0x122) [0x560c10fbd562] 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x560c1158d1f8] 15: (ThreadPool::WorkThread::entry()+0x10) [0x560c1158e390] 16: (()+0x7494) [0x7f737cd47494] 17: (clone()+0x3f) [0x7f737bdceacf]
in thread 7fe09aff8700 thread_name:bstore_kv_final ceph version 12.2.8-10-ga49f886acf (a49f886acf37d7254fc404807e4d26ffe16d3096) luminous (stable) 1: (()+0xa3b144) [0x560ebb7c1144] 2: (()+0x110c0) [0x7fe0ad5b20c0] 3: (()+0xe159) [0x7fe0af04b159] 4: (()+0x298c6) [0x7fe0af0668c6] 5: (free()+0x359) [0x7fe0af0414e9] 6: (std::__detail::_Hashtable_alloc<mempool::pool_allocator<(mempool::pool_index_t)4, std::__detail::_Hash_node<std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, true> > >::_M_deallocate_node(std::__detail::_Hash_node<std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, true>*)+0x6b) [0x560ebb6ce5db] 7: (std::_Hashtable<ghobject_t, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, mempool::pool_allocator<(mempool::pool_index_t)4, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> > >, std::__detail::_Select1st, std::equal_to<ghobject_t>, std::hash<ghobject_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::clear()+0x23) [0x560ebb6ce663] 8: (BlueStore::OnodeSpace::clear()+0x1b2) [0x560ebb667542] 9: (BlueStore::_reap_collections()+0x383) [0x560ebb667973] 10: (BlueStore::_kv_finalize_thread()+0xbc9) [0x560ebb6774a9] 11: (BlueStore::KVFinalizeThread::entry()+0xd) [0x560ebb6d18ad] 12: (()+0x7494) [0x7fe0ad5a8494] 13: (clone()+0x3f) [0x7fe0ac62facf]
All from different servers and different OSDs.
#2 Updated by Igor Fedotov about 5 years ago
The second log is similar to
http://tracker.ceph.com/issues/36526
#3 Updated by Stefan Priebe about 5 years ago
Yes so my question is if all of those are may be just a result of the race mentioned here: https://github.com/ceph/ceph/pull/24701
#4 Updated by Sage Weil about 5 years ago
Stefan Priebe wrote:
Yes so my question is if all of those are may be just a result of the race mentioned here: https://github.com/ceph/ceph/pull/24701
It could be. We'll get those 2 recent fixes backported shortly. How reproducible is this? Can we do a test branch/build on top of 12.2.9 for you to test?
#5 Updated by Sage Weil about 5 years ago
- Related to Bug #36526: segv in BlueStore::OldExtent::create added
#6 Updated by Stefan Priebe about 5 years ago
Not that good ;-) it always happen, when we trigger a heavy backfill or recovery. But i don't want to pull that many disks ;-) We only had this while replacing old ssds with new ones.
#7 Updated by Sage Weil almost 5 years ago
- Status changed from New to Duplicate