Bug #18042
closedSimpleMessenger debug delay injection deadlock
0%
Description
Thread 86 (Thread 0x7fbad7b56700 (LWP 21174)): #0 0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fbb060b47d8 in Mutex::Lock (this=0x7fbb10fa44b0, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #4 0x00007fbb061f8139 in Pipe::fault (this=this@entry=0x7fbb12d74800, onread=onread@entry=true) at msg/simple/Pipe.cc:1391 #5 0x00007fbb061fecc0 in Pipe::reader (this=0x7fbb12d74800) at msg/simple/Pipe.cc:1697 #6 0x00007fbb062084bd in Pipe::Reader::entry (this=<optimized out>) at msg/simple/Pipe.h:52 #7 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #8 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #9 0x0000000000000000 in ?? () #7 0x00007fbb05a4dc06 in OSD::op_is_discardable (op=op@entry=0x7fbb12215b80) at osd/OSD.cc:8616 #8 0x00007fbb05b22cec in PG::can_discard_op (this=0x7fbb1142d000, op=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00) at osd/PG.cc:5353 #9 0x00007fbb05b23405 in PG::can_discard_request (this=this@entry=0x7fbb1142d000, op=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00) at osd/PG.cc:5439 #10 0x00007fbb05bc21e9 in ReplicatedPG::do_request (this=0x7fbb1142d000, op=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00, handle=...) at osd/ReplicatedPG.cc:1448 #11 0x00007fbb05a7607d in OSD::dequeue_op (this=0x7fbb1114a000, pg=..., op=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00, handle=...) at osd/OSD.cc:8815 #12 0x00007fbb05a762cd in PGQueueable::RunVis::operator() (this=this@entry=0x7fbae7c8a7b0, op=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00) at osd/OSD.cc:163 #13 0x00007fbb05a7adf9 in internal_visit<std::shared_ptr<OpRequest> > (operand=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00, this=<synthetic pointer>) at /usr/include/boost/variant/variant.hpp:1017 #14 visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, std::shared_ptr<OpRequest> > (storage=0x7fbae7c8a8d0, visitor=<synthetic pointer>) at /usr/include/boost/variant/detail/visitation_impl.hpp:130 #15 visitation_impl_invoke<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, std::shared_ptr<OpRequest>, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub>::has_fallback_type_> (internal_which=<optimized out>, t=0x0, storage=0x7fbae7c8a8d0, visitor=<synthetic pointer>) at /usr/include/boost/variant/detail/visitation_impl.hpp:173 #16 visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<3l>, std::shared_ptr<OpRequest>, boost::mpl::l_item<mpl_::long_<2l>, PGSnapTrim, boost::mpl::l_item<mpl_::long_<1l>, PGScrub, boost::mpl::l_end> > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub>::has_fallback_type_> (no_backup_flag=..., storage=0x7fbae7c8a8d0, visitor=<synthetic pointer>, logical_which=<optimized out>, internal_which=<optimized out>) at /usr/include/boost/variant/detail/visitation_impl.hpp:256 #17 internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*> (storage=0x7fbae7c8a8d0, visitor=<synthetic pointer>, logical_which=<optimized out>, internal_which=<optimized out>) at /usr/include/boost/variant/variant.hpp:2326 #18 internal_apply_visitor<boost::detail::variant::invoke_visitor<PGQueueable::RunVis> > (visitor=<synthetic pointer>, this=0x7fbae7c8a8c8) at /usr/include/boost/variant/variant.hpp:2337 #19 apply_visitor<PGQueueable::RunVis> (visitor=..., this=0x7fbae7c8a8c8) at /usr/include/boost/variant/variant.hpp:2360 #20 apply_visitor<PGQueueable::RunVis, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub> > (visitable=..., visitor=...) at /usr/include/boost/variant/detail/apply_visitor_unary.hpp:60 #21 run (handle=..., pg=..., osd=<optimized out>, this=0x7fbae7c8a8c8) at osd/OSD.h:392 #22 OSD::ShardedOpWQ::_process (this=0x7fbb1114b1e8, thread_index=<optimized out>, hb=<optimized out>) at osd/OSD.cc:8696 ---Type <return> to continue, or q <return> to quit--- #23 0x00007fbb060f36e7 in ShardedThreadPool::shardedthreadpool_worker (this=0x7fbb1114a768, thread_index=<optimized out>) at common/WorkQueue.cc:340 #24 0x00007fbb060f5650 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:684 #25 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #26 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #27 0x0000000000000000 in ?? () #0 0x00007fbb03e20a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fbb05a7a6e5 in WaitUntil (when=..., mutex=..., this=0x7fbb10edc7d8) at common/Cond.h:72 #2 WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x7fbb10edc7d8) at common/Cond.h:81 #3 OSD::ShardedOpWQ::_process (this=0x7fbb1114b1e8, thread_index=<optimized out>, hb=0x7fbb11193660) at osd/OSD.cc:8644 #4 0x00007fbb060f36e7 in ShardedThreadPool::shardedthreadpool_worker (this=0x7fbb1114a768, thread_index=<optimized out>) at common/WorkQueue.cc:340 #5 0x00007fbb060f5650 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:684 #6 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #8 0x0000000000000000 in ?? () Thread 61 (Thread 0x7fbadd96e700 (LWP 21309)): #0 0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb13d0d170, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #4 0x00007fbb060d85d6 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115 #5 is_connected (this=0x7fbb13d0c000) at msg/simple/Pipe.h:143 #6 SimpleMessenger::is_connected (this=<optimized out>, con=<optimized out>) at msg/simple/SimpleMessenger.cc:282 #7 0x00007fbb05a4dc06 in OSD::op_is_discardable (op=op@entry=0x7fbb12214dc0) at osd/OSD.cc:8616 #8 0x00007fbb05a82e9a in OSD::handle_op (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 2, weak 0) 0x7fbb1144f800, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>' warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>' std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:8466 #9 0x00007fbb05a840ce in OSD::dispatch_op_fast (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 2, weak 0) 0x7fbb1144f800, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>' warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>' std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:6135 #10 0x00007fbb05a843e8 in OSD::dispatch_session_waiting (this=this@entry=0x7fbb1114a000, session=session@entry=0x7fbb1345d260, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>' warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>' std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:5771 #11 0x00007fbb05a84794 in OSD::ms_fast_dispatch (this=0x7fbb1114a000, m=<optimized out>) at osd/OSD.cc:5883 #12 0x00007fbb061c3cf6 in ms_fast_dispatch (m=0x7fbb12214dc0, this=0x7fbb10fa4000) at msg/Messenger.h:555 #13 DispatchQueue::fast_dispatch (this=0x7fbb10fa41d0, m=m@entry=0x7fbb12214dc0) at msg/simple/DispatchQueue.cc:71 #14 0x00007fbb061f6372 in Pipe::DelayedDelivery::entry (this=0x7fbb11733560) at msg/simple/Pipe.cc:226 #15 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #16 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #17 0x0000000000000000 in ?? () Thread 45 (Thread 0x7fbadab63700 (LWP 21310)): #0 0x00007fbb03e206d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fbb061ed78f in Wait (mutex=..., this=0x7fbb11733638) at common/Cond.h:56 #2 Pipe::DelayedDelivery::stop_fast_dispatching (this=0x7fbb11733560) at msg/simple/Pipe.cc:249 #3 0x00007fbb061ee99f in Pipe::stop_and_wait (this=this@entry=0x7fbb13d0c000) at msg/simple/Pipe.cc:1502 #4 0x00007fbb061fcd3f in Pipe::accept (this=this@entry=0x7fbb117be800) at msg/simple/Pipe.cc:696 #5 0x00007fbb062007cf in Pipe::reader (this=0x7fbb117be800) at msg/simple/Pipe.cc:1517 #6 0x00007fbb062084bd in Pipe::Reader::entry (this=<optimized out>) at msg/simple/Pipe.h:52 #7 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #8 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #9 0x0000000000000000 in ?? () Thread 43 (Thread 0x7fbaef49b700 (LWP 17003)): #0 0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7ffd68175960, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #4 0x00007fbb05a63e58 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115 #5 sub_want_increment (flags=1, start=35, what="\230vR\023\273\177\000\000\000\341\335\177\206\261\006\026ЍI\357\272\177\000\000\000\240\024\021\273\177\000\000\300\215I\357\272\177\000\000\030\263\024\021\273\177\000\000#\000\000\000\000\000\000\000ЍI\357\272\177\000\000\200%\367\022\273\177\000\000\254@\246\005\273\177\000\000#\000\000\000\000\000\000\000۪\004\006\273\177\000\000؍I\357\272\177\000\000\000\341\335\177\206\261\006\026\000\240\024\021\273\177\000\000@\225I\357\272\177\000\000\"\000\000\000\000\000\000\000\277A\246\005\273\177\000\000\200\212\031\021\273\177\000\000@\301\355\020\273\177\000\000@\301\355\020\273\177\000\000\312V\321\002\"\000\000\000\000(H\021\273\177\000\000\340u\330\022\273\177\000\000\370\216\367\002\273\177\000\000\060o\367\002"...<Address 0x7fbaf0ca0792 out of bounds>, this=0x7ffd681757d0) at mon/MonClient.h:280 #6 OSD::osdmap_subscribe (this=this@entry=0x7fbb1114a000, epoch=35, force_request=force_request@entry=false) at osd/OSD.cc:6535 #7 0x00007fbb05a640ac in OSD::wait_for_new_map (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 4, weak 0) 0x7fbb11482800) at osd/OSD.cc:6468 #8 0x00007fbb05a641bf in OSD::require_same_or_newer_map (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 4, weak 0) 0x7fbb11482800, epoch=35, is_fast_dispatch=is_fast_dispatch@entry=false) at osd/OSD.cc:7370 ---Type <return> to continue, or q <return> to quit--- #9 0x00007fbb05a9a19c in OSD::handle_pg_notify (this=0x7fbb1114a000, op=std::shared_ptr (count 4, weak 0) 0x7fbb11482800) at osd/OSD.cc:7766 #10 0x00007fbb05a9b730 in OSD::dispatch_op (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 4, weak 0) 0x7fbb11482800) at osd/OSD.cc:6078 #11 0x00007fbb05a9c35e in OSD::_dispatch (this=this@entry=0x7fbb1114a000, m=m@entry=0x7fbb12f72580) at osd/OSD.cc:6252 #12 0x00007fbb05a9ca1f in OSD::ms_dispatch (this=0x7fbb1114a000, m=0x7fbb12f72580) at osd/OSD.cc:5755 #13 0x00007fbb061c5b9a in ms_deliver_dispatch (m=0x7fbb12f72580, this=0x7fbb10fa4700) at msg/Messenger.h:584 #14 DispatchQueue::entry (this=0x7fbb10fa48d0) at msg/simple/DispatchQueue.cc:185 #15 0x00007fbb060def3d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/simple/DispatchQueue.h:103 #16 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #17 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #18 0x0000000000000000 in ?? () Thread 42 (Thread 0x7fbadef76700 (LWP 17269)): #0 0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7ffd68175960, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #4 0x00007fbb05a63e58 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115 #5 sub_want_increment (flags=1, start=35, what="\230~\330\022\273\177\000\000\000\341\335\177\206\261\006\026\000\240\024\021\273\177\000\000@\301\355\020\273\177\000\000@Z\376\021\273\177\000\000\360C\367\336\272\177\000\000\000\240\024\021\273\177\000\000\060\256\024\021\273\177\000\000\240Fk\021\273\177\000\000\352s\247\005\273\177\000\000\200\332o\022\273\177\000\000\000\240\355\022\273\177\000\000\220D\367\336\272\177\000\000\060\000\000\000\273\177\000\000\314C\367\336\272\177\000\000\200|/\022\273\177\000\000P\200#\021\273\177\000\000\334Ӵ\021\273\177\000\000X\200#\021\273\177\000\000H\200#\021\273\177\000\000@\200#\021\273\177\000\000\312V\321\002\002", '\000' <repeats 15 times>, "\002\000\000\000\000\204.\021\273\177\000\000\242U\321\002"...<Address 0x7fbadf97a000 out of bounds>, this=0x7ffd681757d0) at mon/MonClient.h:280 #6 OSD::osdmap_subscribe (this=this@entry=0x7fbb1114a000, epoch=35, force_request=force_request@entry=false) at osd/OSD.cc:6535 #7 0x00007fbb05a773ea in OSD::handle_osd_ping (this=this@entry=0x7fbb1114a000, m=m@entry=0x7fbb12eda000) at osd/OSD.cc:4001 #8 0x00007fbb05a7866b in OSD::heartbeat_dispatch (this=0x7fbb1114a000, m=0x7fbb12eda000) at osd/OSD.cc:5713 #9 0x00007fbb06213d9b in ms_fast_dispatch (m=<optimized out>, this=0x7fbb10fac000) at msg/Messenger.h:555 #10 AsyncConnection::DelayedDelivery::do_request (this=<optimized out>, id=<optimized out>) at msg/async/AsyncConnection.cc:2453 #11 0x00007fbb061c8626 in EventCenter::process_time_events (this=this@entry=0x7fbb10faa848) at msg/async/Event.cc:307 #12 0x00007fbb061c9d78 in EventCenter::process_events (this=this@entry=0x7fbb10faa848, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000) at msg/async/Event.cc:392 #13 0x00007fbb061a90e0 in Worker::entry (this=0x7fbb10faa800) at msg/async/AsyncMessenger.cc:294 #14 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #15 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #16 0x0000000000000000 in ?? () Thread 38 (Thread 0x7fbae6489700 (LWP 17021)): #0 0x00007fbb03e20a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fbb05a7a6e5 in WaitUntil (when=..., mutex=..., this=0x7fbb10edca58) at common/Cond.h:72 #2 WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x7fbb10edca58) at common/Cond.h:81 #3 OSD::ShardedOpWQ::_process (this=0x7fbb1114b1e8, thread_index=<optimized out>, hb=0x7fbb111936c0) at osd/OSD.cc:8644 #4 0x00007fbb060f36e7 in ShardedThreadPool::shardedthreadpool_worker (this=0x7fbb1114a768, thread_index=<optimized out>) at common/WorkQueue.cc:340 #5 0x00007fbb060f5650 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:684 #6 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #8 0x0000000000000000 in ?? () Thread 28 (Thread 0x7fbaea491700 (LWP 17013)): #0 0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb1142d528, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #4 0x00007fbb05b0df83 in PG::lock (this=this@entry=0x7fbb1142d000, no_lockdep=no_lockdep@entry=false) at osd/PG.cc:265 #5 0x00007fbb05b0e11e in PG::lock_suspend_timeout (this=0x7fbb1142d000, handle=...) at osd/PG.cc:259 #6 0x00007fbb05a7a869 in OSD::ShardedOpWQ::_process (this=0x7fbb1114b1e8, thread_index=<optimized out>, hb=0x7fbb11193510) at osd/OSD.cc:8658 #7 0x00007fbb060f36e7 in ShardedThreadPool::shardedthreadpool_worker (this=0x7fbb1114a768, thread_index=<optimized out>) at common/WorkQueue.cc:340 #8 0x00007fbb060f5650 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:684 #9 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #10 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () Thread 16 (Thread 0x7fbafa647700 (LWP 16951)): #0 0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb10fa44b0, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #4 0x00007fbb060d8b04 in SimpleMessenger::mark_down (this=0x7fbb10fa4000, con=0x7fbb10f4a880) at msg/simple/SimpleMessenger.cc:655 #5 0x00007fbb060a4e4f in MonClient::_reopen_session (this=this@entry=0x7ffd681757d0, rank=rank@entry=-1, name="") at mon/MonClient.cc:617 #6 0x00007fbb05a6f9bb in _reopen_session (this=0x7ffd681757d0) at mon/MonClient.h:191 #7 reopen_session (cb=0x0, this=0x7ffd681757d0) at mon/MonClient.h:346 #8 OSD::tick_without_osd_lock (this=0x7fbb1114a000) at osd/OSD.cc:4298 #9 0x00007fbb05aa3e29 in Context::complete (this=0x7fbb10eac3b0, r=<optimized out>) at include/Context.h:64 #10 0x00007fbb060ec304 in SafeTimer::timer_thread (this=0x7fbb1114a190) at common/Timer.cc:105 #11 0x00007fbb060edd3d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38 #12 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () Thread 13 (Thread 0x7fbaf0c9e700 (LWP 17000)): #0 0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb1345d340, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #4 0x00007fbb05a55665 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115 #5 session_handle_reset (session=0x7fbb1345d260, this=0x7fbb1114a000) at osd/OSD.h:1438 #6 OSD::ms_handle_reset (this=0x7fbb1114a000, con=0x7fbb13187680) at osd/OSD.cc:4687 #7 0x00007fbb061c560f in ms_deliver_handle_reset (con=0x7fbb13187680, this=0x7fbb10fa4000) at msg/Messenger.h:657 #8 DispatchQueue::entry (this=0x7fbb10fa41d0) at msg/simple/DispatchQueue.cc:173 #9 0x00007fbb060def3d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/simple/DispatchQueue.h:103 #10 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #12 0x0000000000000000 in ?? () Thread 8 (Thread 0x7fbafe64f700 (LWP 16943)): #0 0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7ffd68175960, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #4 0x00007fbb05cb7afa in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115 #5 sub_want (flags=1, start=35, what="", this=0x7ffd681757d0) at mon/MonClient.h:264 #6 Objecter::_maybe_request_map (this=this@entry=0x7fbb11158000) at osdc/Objecter.cc:1860 #7 0x00007fbb05cbf386 in Objecter::tick (this=0x7fbb11158000) at osdc/Objecter.cc:2056 #8 0x00007fbb05ae26d1 in operator() (this=0x7fbb10ee6850) at /usr/include/c++/4.8.2/functional:2471 #9 ceph::timer_detail::timer<ceph::time_detail::mono_clock>::timer_thread (this=0x7fbb111581f8) at common/ceph_timer.h:137 #10 0x00007fbb02d40220 in ?? () from /lib64/libstdc++.so.6 #11 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #12 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #13 0x0000000000000000 in ?? () Thread 6 (Thread 0x7fbafce4c700 (LWP 16946)): #0 0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb1114ae30, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #4 0x00007fbb05a76f6e in OSD::handle_osd_ping (this=this@entry=0x7fbb1114a000, m=m@entry=0x7fbb111ffc00) at osd/OSD.cc:3878 #5 0x00007fbb05a7866b in OSD::heartbeat_dispatch (this=0x7fbb1114a000, m=0x7fbb111ffc00) at osd/OSD.cc:5713 #6 0x00007fbb06213d9b in ms_fast_dispatch (m=<optimized out>, this=0x7fbb10fac000) at msg/Messenger.h:555 #7 AsyncConnection::DelayedDelivery::do_request (this=<optimized out>, id=<optimized out>) at msg/async/AsyncConnection.cc:2453 #8 0x00007fbb061c8626 in EventCenter::process_time_events (this=this@entry=0x7fbb10fa87c8) at msg/async/Event.cc:307 #9 0x00007fbb061c9d78 in EventCenter::process_events (this=this@entry=0x7fbb10fa87c8, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000) at msg/async/Event.cc:392 #10 0x00007fbb061a90e0 in Worker::entry (this=0x7fbb10fa8780) at msg/async/AsyncMessenger.cc:294 #11 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #12 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #13 0x0000000000000000 in ?? () Thread 2 (Thread 0x7fbafd64d700 (LWP 16945)): #0 0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb1114ae30, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #4 0x00007fbb05a76f6e in OSD::handle_osd_ping (this=this@entry=0x7fbb1114a000, m=m@entry=0x7fbb113b7600) at osd/OSD.cc:3878 ---Type <return> to continue, or q <return> to quit--- #5 0x00007fbb05a7866b in OSD::heartbeat_dispatch (this=0x7fbb1114a000, m=0x7fbb113b7600) at osd/OSD.cc:5713 #6 0x00007fbb06213d9b in ms_fast_dispatch (m=<optimized out>, this=0x7fbb10fac000) at msg/Messenger.h:555 #7 AsyncConnection::DelayedDelivery::do_request (this=<optimized out>, id=<optimized out>) at msg/async/AsyncConnection.cc:2453 #8 0x00007fbb061c8626 in EventCenter::process_time_events (this=this@entry=0x7fbb10fa8548) at msg/async/Event.cc:307 #9 0x00007fbb061c9d78 in EventCenter::process_events (this=this@entry=0x7fbb10fa8548, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000) at msg/async/Event.cc:392 #10 0x00007fbb061a90e0 in Worker::entry (this=0x7fbb10fa8500) at msg/async/AsyncMessenger.cc:294 #11 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0 #12 0x00007fbb024a828d in ?? () from /lib64/libc.so.6 #13 0x0000000000000000 in ?? ()
/a/sage-2016-11-25_00:58:22-rados:thrash-wip-prune-past-intervals-jewel---basic-smithi/576078
Files
Updated by Samuel Just over 7 years ago
Thread 45 (Thread 0x7fbadab63700 (LWP 21310)):
#0 0x00007fbb03e206d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fbb061ed78f in Wait (mutex=..., this=0x7fbb11733638) at common/Cond.h:56
#2 Pipe::DelayedDelivery::stop_fast_dispatching (this=0x7fbb11733560) at msg/simple/Pipe.cc:249
#3 0x00007fbb061ee99f in Pipe::stop_and_wait (this=this@entry=0x7fbb13d0c000) at msg/simple/Pipe.cc:1502
#4 0x00007fbb061fcd3f in Pipe::accept (this=this@entry=0x7fbb117be800) at msg/simple/Pipe.cc:696
#5 0x00007fbb062007cf in Pipe::reader (this=0x7fbb117be800) at msg/simple/Pipe.cc:1517
#6 0x00007fbb062084bd in Pipe::Reader::entry (this=<optimized out>) at msg/simple/Pipe.h:52
#7 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
Thread 45 is accepting a replacement for Pipe 0x7fbb13d0c000. Pipe::accept when it calls into stop_and_wait is holding both the new and existing pipe locks. stop_and_wait calls into DelayedDelivery::stop_fast_dispatching while holding the pipe lock. stop_fast_dispatching blocks until the delay thread stops while holding the pipe_lock. The corresponding DelayedDelivery object is 0x7fbb11733560.
Thread 61 (Thread 0x7fbadd96e700 (LWP 21309)):
#0 0x00007fbb03e22f4d in _lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb13d0d170, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4 0x00007fbb060d85d6 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115
#5 is_connected (this=0x7fbb13d0c000) at msg/simple/Pipe.h:143
#6 SimpleMessenger::is_connected (this=<optimized out>, con=<optimized out>) at msg/simple/SimpleMessenger.cc:282
#7 0x00007fbb05a4dc06 in OSD::op_is_discardable (op=op@entry=0x7fbb12214dc0) at osd/OSD.cc:8616
#8 0x00007fbb05a82e9a in OSD::handle_op (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 2, weak 0) 0x7fbb1144f800, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (_gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (_gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:8466
#9 0x00007fbb05a840ce in OSD::dispatch_op_fast (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 2, weak 0) 0x7fbb1144f800, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (_gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (_gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:6135
#10 0x00007fbb05a843e8 in OSD::dispatch_session_waiting (this=this@entry=0x7fbb1114a000, session=session@entry=0x7fbb1345d260, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (_gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:5771
#11 0x00007fbb05a84794 in OSD::ms_fast_dispatch (this=0x7fbb1114a000, m=<optimized out>) at osd/OSD.cc:5883
#12 0x00007fbb061c3cf6 in ms_fast_dispatch (m=0x7fbb12214dc0, this=0x7fbb10fa4000) at msg/Messenger.h:555
#13 DispatchQueue::fast_dispatch (this=0x7fbb10fa41d0, m=m@entry=0x7fbb12214dc0) at msg/simple/DispatchQueue.cc:71
#14 0x00007fbb061f6372 in Pipe::DelayedDelivery::entry (this=0x7fbb11733560) at msg/simple/Pipe.cc:226
#15 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()
Thread 61 has the DelayedDelivery object 0x7fbb11733560 calling through the fast_dispatch path until it hits all the way down to op_is_discardable which calls is_connected on our friend Pipe 0x7fbb13d0c000 which thread 45 has locked completing the cycle.
Updated by Samuel Just over 7 years ago
- Subject changed from jewel: osd: lock cycle / deadlock to jewel: SimpleMessenger debug delay injection deadlock
- Priority changed from Immediate to High
I don't think this is a recent regression, demoting to high for now.
Updated by Samuel Just over 7 years ago
- Subject changed from jewel: SimpleMessenger debug delay injection deadlock to SimpleMessenger debug delay injection deadlock
The same conditions appear to be present in master.
Updated by Samuel Just over 7 years ago
- Status changed from New to 7
- Assignee set to Samuel Just
I think I can just unlock the pipe_lock there since we unlock it waiting on a cond just below there.