Actions
Bug #57895
closedOSD crash in Onode::put()
Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
This issue happens when an Onode is being trimmed right away after it's unpinned. This is possible when the LRU list is extremely short
Below are the crash stacks (happened on unpin and trim thread):
1: (()+0x12890) [0x7f74d588a890]
2: (ceph::buffer::v15_2_0::ptr::release()+0x8) [0x555c649a9e18]
3: (BlueStore::Onode::put()+0x1c1) [0x555c6462c621]
4: (std::__detail::_Hashtable_alloc<mempool::pool_allocator<(mempool::pool_index_t)4, std::__detail::_Hash_node<std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, true> > >::_M_deallocate_node(std::__detail::_Hash_node<std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, true>)+0x35) [0x555c646dc3c5]
5: (std::_Hashtable<ghobject_t, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, mempool::pool_allocator<(mempool::pool_index_t)4, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> > >, std::__detail::_Select1st, std::equal_to<ghobject_t>, std::hash<ghobject_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base, std::__detail::_Hash_node<std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, true>)+0x53) [0x555c646dc803]
6: (BlueStore::OnodeSpace::_remove(ghobject_t const&)+0x12c) [0x555c6462c2cc]
7: (LruOnodeCacheShard::_trim_to(unsigned long)+0xce) [0x555c646dd33e]
8: (BlueStore::OnodeSpace::add(ghobject_t const&, boost::intrusive_ptr<BlueStore::Onode>&)+0x152) [0x555c6462ce22]
9: (BlueStore::Collection::get_onode(ghobject_t const&, bool, bool)+0x384) [0x555c6468d5a4]
10: (BlueStore::_txc_add_transaction(BlueStore::TransContext, ceph::os::Transaction*)+0x1c29) [0x555c64696999]
11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ae) [0x555c646afb4e]
12: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0x555c6433af54]
13: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xb08) [0x555c644e5f18]
14: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x187) [0x555c644f6397]
15: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x87) [0x555c64384517]
16: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x684) [0x555c6432acd4]
17: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x159) [0x555c641b7229]
18: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x67) [0x555c6440a227]
19: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x623) [0x555c641d35f3]
20: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x555c64807f0c]
21: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x555c6480b160]
22: (()+0x76db) [0x7f74d587f6db]
23: (clone()+0x3f) [0x7f74d55a888f]
and
ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)
1: (()+0x12890) [0x7ff0ee5fd890]
2: (ceph::buffer::v15_2_0::ptr::release()+0x8) [0x55f9c9954e18]
3: (BlueStore::Onode::put()+0x1c1) [0x55f9c95d7621]
4: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_ptr<BlueStore::Onode> >, std::less<boost::intru
sive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > >::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<BlueStore::Onode> >)+0x2d) [0x55f9c9687
d0d]
5: (BlueStore::TransContext::~TransContext()+0x114) [0x55f9c9687e44]
6: (BlueStore::_txc_finish(BlueStore::TransContext)+0x448) [0x55f9c9617788]
7: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x24c) [0x55f9c961907c]
8: (BlueStore::_kv_finalize_thread()+0x48c) [0x55f9c965c31c]
9: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55f9c968c01d]
10: (()+0x76db) [0x7ff0ee5f26db]
11: (clone()+0x3f) [0x7ff0ee31b88f]
I believe this issue is still present on the Master branch
Actions