Project

General

Profile

Actions

Bug #43147

closed

segv in LruOnodeCacheShard::_pin

Added by Sage Weil over 4 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-12-04T23:38:51.153 INFO:tasks.ceph.osd.1.smithi012.stderr:*** Caught signal (Aborted) **
2019-12-04T23:38:51.153 INFO:tasks.ceph.osd.1.smithi012.stderr: in thread 7f71fb6e6700 thread_name:tp_osd_tp
2019-12-04T23:38:51.153 INFO:tasks.ceph.osd.1.smithi012.stderr:*** Caught signal (Segmentation fault) **
2019-12-04T23:38:51.153 INFO:tasks.ceph.osd.1.smithi012.stderr: in thread 7f71f4ed9700 thread_name:tp_osd_tp
2019-12-04T23:38:51.166 INFO:teuthology.orchestra.run.smithi012.stdout:ERROR: (22) Invalid argument
2019-12-04T23:38:51.169 INFO:teuthology.orchestra.run.smithi012.stderr:nodeep-scrub is unset
2019-12-04T23:38:51.176 INFO:tasks.ceph.osd.1.smithi012.stderr: ceph version 15.0.0-7988-g78cce6a (78cce6a95dd180e4f7be8d8930f478d1af138b12) octopus (dev)
2019-12-04T23:38:51.177 INFO:tasks.ceph.osd.1.smithi012.stderr: 1: (()+0x12890) [0x7f721e63a890]
2019-12-04T23:38:51.177 INFO:tasks.ceph.osd.1.smithi012.stderr: 2: (LruOnodeCacheShard::_pin(BlueStore::Onode&)+0x9a) [0x55c1b0e7704a]
2019-12-04T23:38:51.177 INFO:tasks.ceph.osd.1.smithi012.stderr: 3: (BlueStore::Onode::get()+0x56) [0x55c1b0e61a36]
2019-12-04T23:38:51.177 INFO:tasks.ceph.osd.1.smithi012.stderr: 4: (BlueStore::OnodeSpace::lookup(ghobject_t const&)+0x1d0) [0x55c1b0dcd2a0]
2019-12-04T23:38:51.177 INFO:tasks.ceph.osd.1.smithi012.stderr: 5: (BlueStore::Collection::get_onode(ghobject_t const&, bool, bool)+0xa1) [0x55c1b0ddc171]
2019-12-04T23:38:51.177 INFO:tasks.ceph.osd.1.smithi012.stderr: 6: (BlueStore::omap_get_values(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v14_2_0::list> > >*)+0xb0) [0x55c1b0e004e0]
2019-12-04T23:38:51.178 INFO:tasks.ceph.osd.1.smithi012.stderr: 7: (MapCacher::MapCacher<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list>::get_keys(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v14_2_0::list> > >*)+0x31e) [0x55c1b0b3f1de]
2019-12-04T23:38:51.178 INFO:tasks.ceph.osd.1.smithi012.stderr: 8: (SnapMapper::get_snaps(hobject_t const&, SnapMapper::object_snaps*)+0xe7) [0x55c1b0b36187]
2019-12-04T23:38:51.178 INFO:tasks.ceph.osd.1.smithi012.stderr: 9: (SnapMapper::update_snaps(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const*, MapCacher::Transaction<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list>*)+0xbc) [0x55c1b0b3960c]
2019-12-04T23:38:51.178 INFO:tasks.ceph.osd.1.smithi012.stderr: 10: (PG::update_snap_map(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, ceph::os::Transaction&)+0x8e0) [0x55c1b09e6410]
2019-12-04T23:38:51.178 INFO:tasks.ceph.osd.1.smithi012.stderr: 11: (non-virtual thunk to PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, std::optional<pg_hit_set_history_t> const&, eversion_t const&, eversion_t const&, bool, ceph::os::Transaction&, bool)+0x1ea) [0x55c1b0ac88da]
2019-12-04T23:38:51.178 INFO:tasks.ceph.osd.1.smithi012.stderr: 12: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&)+0x6d4) [0x55c1b0ca8424]
2019-12-04T23:38:51.178 INFO:tasks.ceph.osd.1.smithi012.stderr: 13: (ECBackend::try_reads_to_commit()+0x789) [0x55c1b0cb7409]
2019-12-04T23:38:51.179 INFO:tasks.ceph.osd.1.smithi012.stderr: 14: (ECBackend::check_ops()+0x1c) [0x55c1b0cba35c]
2019-12-04T23:38:51.179 INFO:tasks.ceph.osd.1.smithi012.stderr: 15: (ECBackend::start_rmw(ECBackend::Op*, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&)+0x87b) [0x55c1b0cbb26b]
2019-12-04T23:38:51.179 INFO:tasks.ceph.osd.1.smithi012.stderr: 16: (ECBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x30d) [0x55c1b0cbccbd]
2019-12-04T23:38:51.179 INFO:tasks.ceph.osd.1.smithi012.stderr: 17: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xd21) [0x55c1b0a576d1]
2019-12-04T23:38:51.179 INFO:tasks.ceph.osd.1.smithi012.stderr: 18: (PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext, std::default_delete<PrimaryLogPG::OpContext> >)+0x84) [0x55c1b0a598c4]
2019-12-04T23:38:51.179 INFO:tasks.ceph.osd.1.smithi012.stderr: 19: (PrimaryLogPG::AwaitAsyncWork::react(PrimaryLogPG::DoSnapWork const&)+0x440) [0x55c1b0a910e0]
2019-12-04T23:38:51.180 INFO:tasks.ceph.osd.1.smithi012.stderr: 20: (boost::statechart::simple_state<PrimaryLogPG::AwaitAsyncWork, PrimaryLogPG::Trimming, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x111) [0x55c1b0b0d481]
2019-12-04T23:38:51.180 INFO:tasks.ceph.osd.1.smithi012.stderr: 21: (boost::statechart::state_machine<PrimaryLogPG::SnapTrimmer, PrimaryLogPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x55c1b0ae317b]
2019-12-04T23:38:51.180 INFO:tasks.ceph.osd.1.smithi012.stderr: 22: (PrimaryLogPG::snap_trimmer(unsigned int)+0xec) [0x55c1b0a4ce3c]
2019-12-04T23:38:51.180 INFO:tasks.ceph.osd.1.smithi012.stderr: 23: (ceph::osd::scheduler::PGSnapTrim::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1b) [0x55c1b0b9f61b]
2019-12-04T23:38:51.180 INFO:tasks.ceph.osd.1.smithi012.stderr: 24: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90c) [0x55c1b0962b6c]
2019-12-04T23:38:51.180 INFO:tasks.ceph.osd.1.smithi012.stderr: 25: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x55c1b0f9dd4c]
2019-12-04T23:38:51.180 INFO:tasks.ceph.osd.1.smithi012.stderr: 26: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c1b0fa0fa0]
2019-12-04T23:38:51.181 INFO:tasks.ceph.osd.1.smithi012.stderr: 27: (()+0x76db) [0x7f721e62f6db]
2019-12-04T23:38:51.181 INFO:tasks.ceph.osd.1.smithi012.stderr: 28: (clone()+0x3f) [0x7f721d3cf88f]
2019-12-04T23:38:51.181 INFO:tasks.ceph.osd.1.smithi012.stderr:2019-12-04T23:38:51.169+0000 7f71f4ed9700 -1 *** Caught signal (Segmentation fault) **

/a/sage-2019-12-04_19:29:26-rados-wip-sage-testing-2019-12-04-0930-distro-basic-smithi/4566691

Related issues 3 (0 open3 closed)

Related to bluestore - Bug #43217: segv in BlueStore::OnodeSpace::map_anyDuplicate

Actions
Related to bluestore - Bug #43131: segfault in BlueStore::Collection::split_cache()Resolved

Actions
Copied to bluestore - Backport #46643: octopus: segv in LruOnodeCacheShard::_pinRejectedNeha OjhaActions
Actions #1

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions #2

Updated by Sage Weil over 4 years ago

  • Related to Bug #43217: segv in BlueStore::OnodeSpace::map_any added
Actions #3

Updated by Sage Weil over 4 years ago

/a/sage-2019-12-09_20:35:48-rados:thrash-erasure-code-wip-sage3-testing-2019-12-09-1226-distro-basic-smithi/4585860

(gdb) bt
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x0000556cc40f7ba0 in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:81
#2  handle_fatal_signal (signum=11) at ./src/global/signal_handler.cc:326
#3  <signal handler called>
#4  boost::intrusive::list_node_traits<void*>::set_next (n=<synthetic pointer>: <optimized out>, next=<synthetic pointer>: <optimized out>)
    at ./obj-x86_64-linux-gnu/boost/include/boost/intrusive/detail/list_node.hpp:66
#5  boost::intrusive::circular_list_algorithms<boost::intrusive::list_node_traits<void*> >::link_before (this_node=<synthetic pointer>: <optimized out>, nxt_node=<optimized out>)
    at ./obj-x86_64-linux-gnu/boost/include/boost/intrusive/circular_list_algorithms.hpp:182
#6  boost::intrusive::list_impl<boost::intrusive::mhtraits<BlueStore::Onode, boost::intrusive::list_member_hook<void, void, void>, &BlueStore::Onode::pin_item>, unsigned long, true, void>::push_front (value=..., this=0x556ccdf4c670) at ./obj-x86_64-linux-gnu/boost/include/boost/intrusive/list.hpp:290
#7  LruOnodeCacheShard::_pin (this=0x556ccdf4a000, o=...) at ./src/os/bluestore/BlueStore.cc:898
#8  0x0000556cc400ce56 in BlueStore::OnodeCacheShard::pin (o=..., this=0x556ccdf4a000) at ./src/os/bluestore/BlueStore.h:1214
#9  BlueStore::Onode::get (this=0x556cd2e4a3c0) at ./src/os/bluestore/BlueStore.h:1125
#10 0x0000556cc3f78db0 in intrusive_ptr_add_ref (o=0x556cd2e4a3c0) at ./src/os/bluestore/BlueStore.h:3316
#11 boost::intrusive_ptr<BlueStore::Onode>::intrusive_ptr (rhs=..., this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/smart_ptr/intrusive_ptr.hpp:93
#12 boost::intrusive_ptr<BlueStore::Onode>::operator= (rhs=..., this=0x7f93f94fd348) at ./obj-x86_64-linux-gnu/boost/include/boost/smart_ptr/intrusive_ptr.hpp:154
#13 BlueStore::OnodeSpace::lookup (this=this@entry=0x556cd4cba140, oid=...) at ./src/os/bluestore/BlueStore.cc:1673
#14 0x0000556cc3f87b51 in BlueStore::Collection::get_onode (this=this@entry=0x556cd4cba000, oid=..., create=create@entry=false, is_createop=is_createop@entry=false)
    at ./src/os/bluestore/BlueStore.cc:3673
#15 0x0000556cc3fa5426 in BlueStore::getattr (this=this@entry=0x556cceba2000, c_=..., oid=..., name=name@entry=0x7f93f94fd770 "snapset", value=...)
    at ./src/os/bluestore/BlueStore.cc:10192
#16 0x0000556cc3cb94e7 in PGBackend::objects_get_attr (this=this@entry=0x556cd58cc600, hoid=..., attr="snapset", out=out@entry=0x7f93f94fd710) at ./src/osd/PGBackend.cc:421
#17 0x0000556cc3c08c08 in PrimaryLogPG::get_snapset_context (this=this@entry=0x556cd2fde000, oid=..., can_create=can_create@entry=true, attrs=<optimized out>, 
    oid_existed=oid_existed@entry=true) at ./src/osd/PrimaryLogPG.cc:11393
#18 0x0000556cc3c092fb in PrimaryLogPG::get_object_context (this=this@entry=0x556cd2fde000, soid=..., can_create=can_create@entry=false, attrs=attrs@entry=0x0)
    at ./src/osd/PrimaryLogPG.cc:11039
#19 0x0000556cc3c1a23c in PrimaryLogPG::prep_object_replica_pushes (this=this@entry=0x556cd2fde000, soid=..., v=..., h=h@entry=0x556cd9605f60, 
    work_started=work_started@entry=0x7f93f94fdf86) at ./src/osd/PrimaryLogPG.cc:12739
#20 0x0000556cc3c65c50 in PrimaryLogPG::recover_replicas (this=this@entry=0x556cd2fde000, max=max@entry=1, handle=..., work_started=work_started@entry=0x7f93f94fdf86)
    at ./src/osd/PrimaryLogPG.cc:12876
#21 0x0000556cc3c6f1f0 in PrimaryLogPG::start_recovery_ops (this=0x556cd2fde000, max=1, handle=..., ops_started=0x7f93f94fe1d8) at ./src/osd/PrimaryLogPG.cc:12375
#22 0x0000556cc3aef5a9 in OSD::do_recovery (this=0x556ccec7e000, pg=0x556cd2fde000, queued=224, reserved_pushes=1, handle=...) at ./src/osd/OSD.cc:9486
#23 0x0000556cc3d48a39 in ceph::osd::scheduler::PGRecovery::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...)
    at ./src/osd/scheduler/OpSchedulerItem.cc:65
#24 0x0000556cc3b0bfec in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f93f94fe510)
    at ./src/osd/scheduler/OpSchedulerItem.h:148
#25 OSD::ShardedOpWQ::_process (this=0x556ccec7eec8, thread_index=<optimized out>, hb=<optimized out>) at ./src/osd/OSD.cc:10679
#26 0x0000556cc414a93c in ShardedThreadPool::shardedthreadpool_worker (this=0x556ccec7ea28, thread_index=8) at ./src/common/WorkQueue.cc:311
#27 0x0000556cc414db90 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at ./src/common/WorkQueue.h:706
#28 0x00007f941c4496db in start_thread (arg=0x7f93f9501700) at pthread_create.c:463

other interesting threads:
Thread 62 (Thread 0x7f93f8d00700 (LWP 12065)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f941c44c098 in __GI___pthread_mutex_lock (mutex=0x556ccdf4a018) at ../nptl/pthread_mutex_lock.c:113
#2  0x0000556cc3f78c3e in __gthread_mutex_lock (__mutex=0x556ccdf4a018) at /usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:748
#3  __gthread_recursive_mutex_lock (__mutex=0x556ccdf4a018) at /usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:810
#4  std::recursive_mutex::lock (this=0x556ccdf4a018) at /usr/include/c++/7/mutex:107
#5  std::lock_guard<std::recursive_mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/include/c++/7/bits/std_mutex.h:162
#6  BlueStore::OnodeSpace::lookup (this=this@entry=0x556ccde85b60, oid=...) at ./src/os/bluestore/BlueStore.cc:1664
#7  0x0000556cc3f87b51 in BlueStore::Collection::get_onode (this=this@entry=0x556ccde85a20, oid=..., create=create@entry=false, is_createop=is_createop@entry=false) at ./src/os/bluestore/BlueStore.cc:3673
#8  0x0000556cc3fabbb0 in BlueStore::omap_get_values (this=0x556cceba2000, c_=..., oid=..., keys=std::set with 1 element = {...}, out=0x7f93f8cfbfb0) at ./src/os/bluestore/BlueStore.cc:10596
#9  0x0000556cc3ce855e in MapCacher::MapCacher<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list>::get_keys (this=this@entry=0x556cd124a998, keys_to_get=std::set with 1 element = {...}, got=got@entry=0x7f93f8cfc100) at ./src/common/map_cacher.hpp:178
#10 0x0000556cc3cdf507 in SnapMapper::get_snaps (this=this@entry=0x556cd124a990, oid=..., out=out@entry=0x7f93f8cfc300) at ./src/osd/SnapMapper.cc:173
#11 0x0000556cc3ce234e in SnapMapper::_remove_oid (this=this@entry=0x556cd124a990, oid=..., t=t@entry=0x7f93f8cfc4e0) at ./src/osd/SnapMapper.cc:369
#12 0x0000556cc3ce2794 in SnapMapper::remove_oid (this=this@entry=0x556cd124a990, oid=..., t=t@entry=0x7f93f8cfc4e0) at ./src/osd/SnapMapper.cc:360

Thread 61 (Thread 0x7f93f9d02700 (LWP 12063)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f941c44c098 in __GI___pthread_mutex_lock (mutex=0x556ccdf4a018) at ../nptl/pthread_mutex_lock.c:113
#2  0x0000556cc3f78c3e in __gthread_mutex_lock (__mutex=0x556ccdf4a018) at /usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:748
#3  __gthread_recursive_mutex_lock (__mutex=0x556ccdf4a018) at /usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:810
#4  std::recursive_mutex::lock (this=0x556ccdf4a018) at /usr/include/c++/7/mutex:107
#5  std::lock_guard<std::recursive_mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/include/c++/7/bits/std_mutex.h:162
#6  BlueStore::OnodeSpace::lookup (this=this@entry=0x556ccde85b60, oid=...) at ./src/os/bluestore/BlueStore.cc:1664
#7  0x0000556cc3f87b51 in BlueStore::Collection::get_onode (this=this@entry=0x556ccde85a20, oid=..., create=create@entry=false, is_createop=is_createop@entry=false) at ./src/os/bluestore/BlueStore.cc:3673
#8  0x0000556cc3fabbb0 in BlueStore::omap_get_values (this=0x556cceba2000, c_=..., oid=..., keys=std::set with 1 element = {...}, out=0x7f93f9cfdfb0) at ./src/os/bluestore/BlueStore.cc:10596
#9  0x0000556cc3ce855e in MapCacher::MapCacher<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list>::get_keys (this=this@entry=0x556cd321c998, keys_to_get=std::set with 1 element = {...}, got=got@entry=0x7f93f9cfe100) at ./src/common/map_cacher.hpp:178
#10 0x0000556cc3cdf507 in SnapMapper::get_snaps (this=this@entry=0x556cd321c990, oid=..., out=out@entry=0x7f93f9cfe300) at ./src/osd/SnapMapper.cc:173
#11 0x0000556cc3ce234e in SnapMapper::_remove_oid (this=this@entry=0x556cd321c990, oid=..., t=t@entry=0x7f93f9cfe4e0) at ./src/osd/SnapMapper.cc:369
#12 0x0000556cc3ce2794 in SnapMapper::remove_oid (this=this@entry=0x556cd321c990, oid=..., t=t@entry=0x7f93f9cfe4e0) at ./src/osd/SnapMapper.cc:360
#13 0x0000556cc3b88033 in PG::clear_object_snap_mapping (this=this@entry=0x556cd321c800, t=t@entry=0x7f93f9cfef50, soid=...) at ./src/osd/PG.cc:315
#14 0x0000556cc3be550c in PrimaryLogPG::on_local_recover (this=0x556cd321c800, hoid=..., _recovery_info=..., obc=std::shared_ptr<ObjectContext> (use count 3, weak count 1) = {...}, is_delete=<optimized out>, t=0x7f93f9cfef50) at ./src/osd/PrimaryLogPG.cc:363
Actions #4

Updated by Sage Weil over 4 years ago

(gdb) p /x o
$3 = (BlueStore::Onode &) @0x556cd2e4a3c0: {
  s = 0x556ccdf4a000, 
  pinned = 0x0, 
  nref = {
    <std::__atomic_base<int>> = {
      static _S_alignment = 0x4, 
      _M_i = 0x2
    }, 
    members of std::atomic<int>: 
    static is_always_lock_free = 0x1
  }, 
  c = 0x556cd4cba000, 

last reference to this onode was here:
2019-12-09T21:07:12.368+0000 7f93fd509700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.0s0_head 0x556cd127c3a0) split_cache moving 0x556cd2e4a3c0 0#2:07485588:::smithi04713584-885:head#
Actions #5

Updated by Neha Ojha over 4 years ago

  • Assignee set to Mark Nelson
Actions #6

Updated by Sage Weil over 4 years ago

  • Related to Bug #43131: segfault in BlueStore::Collection::split_cache() added
Actions #7

Updated by Sage Weil over 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 32665
Actions #8

Updated by Sage Weil about 4 years ago

  • Status changed from Fix Under Review to Resolved
Actions #9

Updated by Brad Hubbard almost 4 years ago

  • Status changed from Resolved to New
  • Priority changed from Urgent to High
  • Affected Versions v16.0.0 added

Reopening this since I have seen it in /a/yuriw-2020-05-24_19:30:40-rados-wip-yuri-master_5.24.20-distro-basic-smithi/5087961

Since the job timed out after 12 hours there are no osd logs or coredumps so I'm seeing if I can reproduce.

2020-05-24T21:15:07.828 INFO:teuthology.orchestra.run.smithi112:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd unset nodeep-scrub
2020-05-24T21:15:07.832 INFO:tasks.ceph.osd.1.smithi112.stderr:*** Caught signal (Segmentation fault) **
2020-05-24T21:15:07.832 INFO:tasks.ceph.osd.1.smithi112.stderr: in thread 7f2a613ff700 thread_name:tp_osd_tp
2020-05-24T21:15:07.833 INFO:tasks.ceph.osd.5.smithi171.stderr:2020-05-24T21:15:07.830+0000 7f7c10eb2700 -1 received  signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 5  (PID: 28170) UID: 0
2020-05-24T21:15:07.837 INFO:tasks.ceph.osd.1.smithi112.stderr: ceph version 16.0.0-1850-g9dce54c (9dce54cc5473ddca3d64786e557f4b0c097deed7) pacific (dev)
2020-05-24T21:15:07.837 INFO:tasks.ceph.osd.1.smithi112.stderr: 1: (()+0x12dc0) [0x7f2a8d06bdc0]
2020-05-24T21:15:07.838 INFO:tasks.ceph.osd.1.smithi112.stderr: 2: (LruOnodeCacheShard::_pin(BlueStore::Onode&)+0xad) [0x556bb9bf2cdd]
2020-05-24T21:15:07.838 INFO:tasks.ceph.osd.1.smithi112.stderr: 3: (BlueStore::Onode::get()+0x56) [0x556bb9bdb4b6]
2020-05-24T21:15:07.838 INFO:tasks.ceph.osd.1.smithi112.stderr: 4: (BlueStore::OnodeSpace::lookup(ghobject_t const&)+0x1c3) [0x556bb9b4a553]
2020-05-24T21:15:07.838 INFO:tasks.ceph.osd.1.smithi112.stderr: 5: (BlueStore::Collection::get_onode(ghobject_t const&, bool, bool)+0x79) [0x556bb9b57339]
2020-05-24T21:15:07.838 INFO:tasks.ceph.osd.1.smithi112.stderr: 6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x119b) [0x556bb9bc94db]
2020-05-24T21:15:07.838 INFO:tasks.ceph.osd.1.smithi112.stderr: 7: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x407) [0x556bb9bcb0d7]
2020-05-24T21:15:07.838 INFO:tasks.ceph.osd.1.smithi112.stderr: 8: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x85) [0x556bb9712265]
2020-05-24T21:15:07.838 INFO:tasks.ceph.osd.1.smithi112.stderr: 9: (OSD::dispatch_context(PeeringCtx&, PG*, std::shared_ptr<OSDMap const>, ThreadPool::TPHandle*)+0xf3) [0x556bb96a9ed3]
2020-05-24T21:15:07.838 INFO:tasks.ceph.osd.1.smithi112.stderr: 10: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x2d8) [0x556bb96da0a8]
2020-05-24T21:15:07.839 INFO:tasks.ceph.osd.1.smithi112.stderr: 11: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x556bb99127f6]
2020-05-24T21:15:07.839 INFO:tasks.ceph.osd.1.smithi112.stderr: 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12ef) [0x556bb96cd2bf]
2020-05-24T21:15:07.839 INFO:tasks.ceph.osd.1.smithi112.stderr: 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x556bb9d19614]
2020-05-24T21:15:07.839 INFO:tasks.ceph.osd.1.smithi112.stderr: 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x556bb9d1c274]
2020-05-24T21:15:07.839 INFO:tasks.ceph.osd.1.smithi112.stderr: 15: (()+0x82de) [0x7f2a8d0612de]
2020-05-24T21:15:07.839 INFO:tasks.ceph.osd.1.smithi112.stderr: 16: (clone()+0x43) [0x7f2a8be0b133]

I'm also reducing the priority since we've only seen this once in recent times (feel free to adjust it). Alos happy to open a new tracker if we would prefer that.

Actions #10

Updated by Neha Ojha almost 4 years ago

  • Backport set to octopus

/a/yuriw-2020-07-13_23:06:23-rados-wip-yuri5-testing-2020-07-13-1944-octopus-distro-basic-smithi/5224399

Actions #11

Updated by Nathan Cutler almost 4 years ago

  • Status changed from New to Pending Backport

Neha - is it OK to backport this to Octopus now?

Actions #12

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #46643: octopus: segv in LruOnodeCacheShard::_pin added
Actions #13

Updated by Neha Ojha over 3 years ago

Nathan Cutler wrote:

Neha - is it OK to backport this to Octopus now?

Igor, please feel to correct me but I'm not sure this is needed since we merged https://tracker.ceph.com/issues/46575, which will eventually be backported to octopus. We should keep this tracker open to track this bug and make sure we don't see it after the octopus backport of https://github.com/ceph/ceph/pull/32852 merges.

Actions #14

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #15

Updated by Igor Fedotov about 1 year ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF