Project

General

Profile

Actions

Bug #42166

closed

crash when LRU trimming

Added by Jeff Layton over 4 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Testing xfstests on kcephfs vs. a vstart cluster, the OSD crashed with this:

 ceph version v15.0.0-5742-ge565e31184c0 (e565e31184c0ffd18e269c1ee0b7ee88dc696f56) octopus (dev)
 1: (()+0x12c60) [0x7f5cb69a3c60]
 2: (gsignal()+0x145) [0x7f5cb6442e35]
 3: (abort()+0x127) [0x7f5cb642d895]
 4: (()+0x18aa2) [0x7f5cb6cc6aa2]
 5: (()+0x1a449) [0x7f5cb6cc8449]
 6: (std::_Hashtable<ghobject_t, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, mempool::pool_allocator<(mempool::pool_index_t)4, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> > >, std::__detail::_Select1st, std::equal_to<ghobject_t>, std::hash<ghobject_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, true>*)+0x88) [0x56273cda08c8]
 7: (LruOnodeCacheShard::_trim_to(unsigned long)+0x242) [0x56273cda4292]
 8: (BlueStore::OnodeSpace::add(ghobject_t const&, boost::intrusive_ptr<BlueStore::Onode>)+0x19d) [0x56273ccefdad]
 9: (BlueStore::Collection::get_onode(ghobject_t const&, bool, bool)+0x62b) [0x56273cd2ee9b]
 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x1d58) [0x56273cd660a8]
 11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x275) [0x56273cd67185]
 12: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x83) [0x56273c8698a3]
 13: (OSD::dispatch_context(PeeringCtx&, PG*, std::shared_ptr<OSDMap const>, ThreadPool::TPHandle*)+0x1f2) [0x56273c81ef72]
 14: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x208) [0x56273c82b698]
 15: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x52) [0x56273caa6c02]
 16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xe6c) [0x56273c82cc6c]
 17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x403) [0x56273cec5923]
 18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x56273cec8680]
 19: (()+0x84c0) [0x7f5cb69994c0]
 20: (clone()+0x43) [0x7f5cb6507553]

This ceph bit is based on 6bafc61e8d7a75733974db87d2af3203f0a3ceb1, with a pile of experimental MDS patches (nothing that should affect OSD operation). OSD log is attached. I unfortunately don't have a core.


Files

osd.0.log.gz (249 KB) osd.0.log.gz Jeff Layton, 10/02/2019 07:02 PM
ceph.conf (4.45 KB) ceph.conf generated by vstart.sh Jeff Layton, 10/02/2019 07:11 PM
Actions

Also available in: Atom PDF