Project

General

Profile

Actions

Bug #43903

closed

osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)

Added by Sage Weil over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

#14 handle_fatal_signal (signum=11) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/global/signal_handler.cc:167
#15 <signal handler called>
#16 0x0000561ccd8acc03 in ceph::buffer::v14_2_0::ptr::release (this=this@entry=0x561ce07b4008) at /usr/include/c++/8/bits/atomic_base.h:303
#17 0x0000561ccd999c62 in ceph::buffer::v14_2_0::ptr::~ptr (this=0x561ce07b4008, __in_chrg=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/include/buffer.h:398
#18 ceph::buffer::v14_2_0::ptr_node::~ptr_node (this=0x561ce07b4000, __in_chrg=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/include/buffer.h:398
#19 ceph::buffer::v14_2_0::ptr_node::disposer::operator() (this=<optimized out>, delete_this=0x561ce07b4000) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/include/buffer.h:393
#20 ceph::buffer::v14_2_0::list::buffers_t::clear_and_dispose (this=0x561cdc15cc50) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/include/buffer.h:638
#21 ceph::buffer::v14_2_0::list::clear (this=0x561cdc15cc50) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/include/buffer.h:1057
#22 PGTempMap::decode (this=0x561cdc15cc50, p=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSDMap.h:128
#23 0x0000561ccd971f7f in decode (p=..., c=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSDMap.h:346
#24 OSDMap::decode (this=0x561cd9b49400, bl=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSDMap.cc:3219
#25 0x0000561ccd974e65 in OSDMap::decode (this=this@entry=0x561cd9b49400, bl=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSDMap.cc:3044
#26 0x0000561ccd0a9013 in OSDService::try_get_map (this=0x561cd79b5350, epoch=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:1615
#27 0x0000561ccd0fa310 in OSD::advance_pg (this=0x561cd79b4000, osd_epoch=<optimized out>, pg=0x561ce65c6000, handle=..., rctx=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:8445
#28 0x0000561ccd0fc5c4 in OSD::dequeue_peering_evt (this=0x561cd79b4000, sdata=0x561cd7803d40, pg=0x561ce65c6000, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSDMap.h:670
#29 0x0000561ccd32ddb6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96
#30 0x0000561ccd0ef62f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f2e3b31f3f0) at /usr/include/c++/8/bits/unique_ptr.h:342
#31 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677
#32 0x0000561ccd71d094 in ShardedThreadPool::shardedthreadpool_worker (this=0x561cd79b4a28, thread_index=2) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311
#33 0x0000561ccd71fcf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706
#34 0x00007f2e5bb282de in start_thread () from /lib64/libpthread.so.0

several other threads are in interesting places:

Thread 33 (Thread 0x7f2e3bb23700 (LWP 55977)):
#0  0x00007f2e5a9308f7 in __memcmp_avx2_movbe () from /lib64/libc.so.6
#1  0x0000561ccd77a588 in std::char_traits<char>::compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>) at /usr/include/c++/8/bits/char_traits.h:312
#2  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare (__str="benchmark_data_smithi200_80267_object41981", this=0x561ceb56ec50) at /usr/include/c++/8/bits/basic_string.h:2849
#3  std::operator< <char, std::char_traits<char>, std::allocator<char> > (__rhs="benchmark_data_smithi200_80267_object41981", __lhs="benchmark_data_smithi200_80267_object41981") at /usr/include/c++/8/bits/basic_string.h:6136
#4  operator< (r=..., l=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/include/object.h:72
#5  cmp (r=..., l=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/hobject.cc:347
#6  cmp (l=..., r=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/hobject.cc:321
#7  0x0000561ccd191790 in operator< (r=..., l=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/hobject.h:308
#8  std::less<hobject_t>::operator() (this=<optimized out>, __y=..., __x=...) at /usr/include/c++/8/bits/stl_function.h:386
#9  std::_Rb_tree<hobject_t, std::pair<hobject_t const, std::set<pg_shard_t, std::less<pg_shard_t>, std::allocator<pg_shard_t> > >, std::_Select1st<std::pair<hobject_t const, std::set<pg_shard_t, std::less<pg_shard_t>, std::allocator<pg_shard_t> > > >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, std::set<pg_shard_t, std::less<pg_shard_t>, std::allocator<pg_shard_t> > > > >::_M_lower_bound (this=<optimized out>, __k=..., __y=0x561ceb56ef70, __x=0x561ceb56ec30) at /usr/include/c++/8/bits/stl_tree.h:1907
#10 std::_Rb_tree<hobject_t, std::pair<hobject_t const, std::set<pg_shard_t, std::less<pg_shard_t>, std::allocator<pg_shard_t> > >, std::_Select1st<std::pair<hobject_t const, std::set<pg_shard_t, std::less<pg_shard_t>, std::allocator<pg_shard_t> > > >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, std::set<pg_shard_t, std::less<pg_shard_t>, std::allocator<pg_shard_t> > > > >::find (this=this@entry=0x561cd9b929a8, __k=...) at /usr/include/c++/8/bits/stl_tree.h:2555
#11 0x0000561ccd166030 in std::map<hobject_t, std::set<pg_shard_t, std::less<pg_shard_t>, std::allocator<pg_shard_t> >, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, std::set<pg_shard_t, std::less<pg_shard_t>, std::allocator<pg_shard_t> > > > >::find (__x=..., this=0x561cd9b929a8)
    at /usr/include/c++/8/bits/stl_map.h:1193
#12 MissingLoc::num_unfound (this=0x561cd9b92978) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/MissingLoc.h:169
#13 PeeringState::get_num_unfound (this=0x561cd9b91440) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/PeeringState.h:2238
#14 operator<< (out=..., pg=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/PG.cc:3411
#15 0x0000561ccd166590 in PG::gen_prefix (this=0x561cd9b90000, out=...) at /usr/include/c++/8/ostream:556
#16 0x0000561ccd362884 in PeeringState::Active::react (this=0x561cead80000, advmap=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/PeeringState.cc:5553
#17 0x0000561ccd395b65 in boost::statechart::custom_reaction<PeeringState::AdvMap>::react<PeeringState::Active, boost::statechart::event_base, void const*> (eventType=<synthetic pointer>: <optimized out>, evt=..., stt=...)
    at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/build/boost/include/boost/statechart/result.hpp:110
#18 boost::statechart::simple_state<PeeringState::Active, PeeringState::Primary, PeeringState::Activating, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list18<boost::statechart::custom_reaction<PeeringState::AdvMap>, boost::statechart::custom_reaction<MInfoRec>, boost::statechart::custom_reaction<MNotifyRec>, boost::statechart::custom_reaction<MLogRec>, boost::statechart::custom_reaction<MTrim>, boost::statechart::custom_reaction<PeeringState::Backfilled>, boost::statechart::custom_reaction<PeeringState::ActivateCommitted>, boost::statechart::custom_reaction<PeeringState::AllReplicasActivated>, boost::statechart::custom_reaction<DeferRecovery>, boost::statechart::custom_reaction<DeferBackfill>, boost::statechart::custom_reaction<PeeringState::UnfoundRecovery>, boost::statechart::custom_reaction<PeeringState::UnfoundBackfill>, boost::statechart::custom_reaction<RemoteReservationRevokedTooFull>, boost::statechart::custom_reaction<RemoteReservationRevoked>, boost::statechart::custom_reaction<PeeringState::DoRecovery>, boost::statechart::custom_reaction<RenewLease>, boost::statechart::custom_reaction<MLeaseAck>, boost::statechart::custom_reaction<PeeringState::CheckReadable> >, boost::statechart::simple_state<PeeringState::Active, PeeringState::Primary, PeeringState::Activating, (boost::statechart::history_mode)0> > (eventType=0x561cce363e18 <boost::statechart::detail::id_holder<PeeringState::AdvMap>::idProvider_>, evt=..., stt=...)
    at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/build/boost/include/boost/statechart/simple_state.hpp:814
...

and
Thread 7 (Thread 0x7f2e36b19700 (LWP 55987)):
#0  0x00007f2e5a930a08 in __memcmp_avx2_movbe () from /lib64/libc.so.6
#1  0x0000561ccd121f1b in std::char_traits<char>::compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>) at /usr/include/c++/8/bits/char_traits.h:312
#2  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare (__str="0000000368.", '0' <repeats 16 times>, "3073", this=0x561cdcae2660) at /usr/include/c++/8/bits/basic_string.h:2849
#3  std::operator< <char, std::char_traits<char>, std::allocator<char> > (__rhs="0000000368.", '0' <repeats 16 times>, "3073", __lhs="0000000369.", '0' <repeats 16 times>, "3200") at /usr/include/c++/8/bits/basic_string.h:6136
#4  std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::operator() (this=<optimized out>, __y="0000000368.", '0' <repeats 16 times>, "3073", __x="0000000369.", '0' <repeats 16 times>, "3200") at /usr/include/c++/8/bits/stl_function.h:386
#5  std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::_Identity<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_lower_bound (this=<optimized out>, __k="0000000368.", '0' <repeats 16 times>, "3073", __y=0x561cdcb12640, __x=0x561cdcae2640)
    at /usr/include/c++/8/bits/stl_tree.h:1907
#6  std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::_Identity<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::find (this=this@entry=0x561ce07b9a20, __k="0000000368.", '0' <repeats 16 times>, "3073") at /usr/include/c++/8/bits/stl_tree.h:2555
#7  0x0000561ccd1b33ad in std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::count
    (__x="0000000368.", '0' <repeats 16 times>, "3073", this=0x561ce07b9a20) at /usr/include/c++/8/bits/stl_tree.h:991
#8  PGLog::check (this=0x561ce07b9718) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/PGLog.cc:612
#9  0x0000561ccd1b4019 in PGLog::undirty (this=0x561ce07b9718) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/PGLog.h:676
#10 PGLog::write_log_and_missing (this=this@entry=0x561ce07b9718, t=..., km=km@entry=0x7f2e36b15310, coll=..., log_oid=..., require_rollback=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/PGLog.cc:649
#11 0x0000561ccd1681f6 in PG::prepare_write (this=0x561ce07b7400, info=..., last_written_info=..., past_intervals=..., pglog=..., dirty_info=<optimized out>, dirty_big_info=<optimized out>, need_write_epoch=<optimized out>, t=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/osd_types.h:1589
#12 0x0000561ccd33c690 in PeeringState::write_if_dirty (this=this@entry=0x561ce07b8840, t=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSDMap.h:670
#13 0x0000561ccd35eb17 in PeeringState::recover_got (this=this@entry=0x561ce07b8840, oid=..., v=..., is_delete=is_delete@entry=false, t=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/PeeringState.cc:3896
#14 0x0000561ccd1c8902 in PrimaryLogPG::on_local_recover (this=<optimized out>, hoid=..., _recovery_info=..., obc=std::shared_ptr<ObjectContext> (empty) = {...}, is_delete=<optimized out>, t=0x7f2e36b15d10) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/PrimaryLogPG.cc:405
...

and maybe
Thread 1 (Thread 0x7f2e47b3b700 (LWP 55940)):
#0  boost::intrusive::bstree_algorithms<boost::intrusive::rbtree_node_traits<void*, false> >::insert_unique_check<unsigned int, boost::intrusive::detail::key_nodeptr_comp<MapKey<WeightedPriorityQueue<ceph::osd::scheduler::OpSchedulerItem, unsigned long>::SubQueue, unsigned int>, boost::intrusive::bhtraits<WeightedPriorityQueue<ceph::osd::scheduler::OpSchedulerItem, unsigned long>::SubQueue, boost::intrusive::rbtree_node_traits<void*, false>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 3u>, boost::move_detail::identity<WeightedPriorityQueue<ceph::osd::scheduler::OpSchedulerItem, unsigned long>::SubQueue> > > (pdepth=0x0, commit_data=<synthetic pointer>..., comp=..., key=<synthetic pointer>: 255, header=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/build/boost/include/boost/intrusive/detail/tree_value_compare.hpp:178
#1  boost::intrusive::bstbase2<boost::intrusive::bhtraits<WeightedPriorityQueue<ceph::osd::scheduler::OpSchedulerItem, unsigned long>::SubQueue, boost::intrusive::rbtree_node_traits<void*, false>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 3u>, void, void, (boost::intrusive::algo_types)5, void>::insert_unique_check<unsigned int, MapKey<WeightedPriorityQueue<ceph::osd::scheduler::OpSchedulerItem, unsigned long>::SubQueue, unsigned int> > (commit_data=<synthetic pointer>..., commit_data=<synthetic pointer>..., key=<synthetic pointer>: 255, this=0x561cd78d3520, comp=...)
    at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/build/boost/include/boost/intrusive/bstree.hpp:500
#2  WeightedPriorityQueue<ceph::osd::scheduler::OpSchedulerItem, unsigned long>::Queue::insert (front=false, item=..., cost=0, cl=0, p=255, this=0x561cd78d3518) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WeightedPriorityQueue.h:217
#3  WeightedPriorityQueue<ceph::osd::scheduler::OpSchedulerItem, unsigned long>::enqueue_strict (item=..., p=255, cl=0, this=0x561cd78d3510) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WeightedPriorityQueue.h:318
#4  ceph::osd::scheduler::ClassedOpQueueScheduler<WeightedPriorityQueue<ceph::osd::scheduler::OpSchedulerItem, unsigned long> >::enqueue (this=0x561cd78d3500, item=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/scheduler/OpScheduler.h:96
#5  0x0000561ccd0f2264 in OSD::ShardedOpWQ::_enqueue (this=0x561cd79b4ec8, item=...) at /usr/include/c++/8/bits/unique_ptr.h:342
#6  0x0000561ccd0f2c68 in ShardedThreadPool::ShardedWQ<ceph::osd::scheduler::OpSchedulerItem>::queue (item=..., this=0x561cd79b4ec8) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:684
#7  OSD::enqueue_peering_evt (this=0x561cd79b4000, pgid=..., evt=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9607
#8  0x0000561ccd0fd553 in OSD::consume_map (this=0x561cd79b4000) at /usr/include/c++/8/ext/new_allocator.h:86
#9  0x0000561ccd102a3c in OSD::_committed_osd_maps (this=0x561cd79b4000, first=<optimized out>, last=<optimized out>, m=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:8273
#10 0x0000561ccd1562cb in C_OnMapCommit::finish (this=0x561cdb289e60, r=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:7678
#11 0x0000561ccd10b06d in Context::complete (this=0x561cdb289e60, r=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/include/Context.h:77
#12 0x0000561ccd6e8f15 in Finisher::finisher_thread_entry (this=0x561cd84f0448) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/Finisher.cc:66
#13 0x00007f2e5bb282de in start_thread () from /lib64/libpthread.so.0

/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718264


Related issues 2 (0 open2 closed)

Related to RADOS - Bug #46443: ceph_osd crash in _committed_osd_maps when failed to encode first inc mapResolvedDan van der Ster

Actions
Copied to RADOS - Backport #44206: nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)ResolvedActions
Actions

Also available in: Atom PDF