Project

General

Profile

Bug #18042

SimpleMessenger debug delay injection deadlock

Added by Sage Weil about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Thread 86 (Thread 0x7fbad7b56700 (LWP 21174)):
#0  0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fbb060b47d8 in Mutex::Lock (this=0x7fbb10fa44b0, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007fbb061f8139 in Pipe::fault (this=this@entry=0x7fbb12d74800, onread=onread@entry=true) at msg/simple/Pipe.cc:1391
#5  0x00007fbb061fecc0 in Pipe::reader (this=0x7fbb12d74800) at msg/simple/Pipe.cc:1697
#6  0x00007fbb062084bd in Pipe::Reader::entry (this=<optimized out>) at msg/simple/Pipe.h:52
#7  0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()
#7  0x00007fbb05a4dc06 in OSD::op_is_discardable (op=op@entry=0x7fbb12215b80) at osd/OSD.cc:8616
#8  0x00007fbb05b22cec in PG::can_discard_op (this=0x7fbb1142d000, op=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00) at osd/PG.cc:5353
#9  0x00007fbb05b23405 in PG::can_discard_request (this=this@entry=0x7fbb1142d000, op=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00) at osd/PG.cc:5439
#10 0x00007fbb05bc21e9 in ReplicatedPG::do_request (this=0x7fbb1142d000, op=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00, handle=...) at osd/ReplicatedPG.cc:1448
#11 0x00007fbb05a7607d in OSD::dequeue_op (this=0x7fbb1114a000, pg=..., op=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00, handle=...) at osd/OSD.cc:8815
#12 0x00007fbb05a762cd in PGQueueable::RunVis::operator() (this=this@entry=0x7fbae7c8a7b0, op=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00) at osd/OSD.cc:163
#13 0x00007fbb05a7adf9 in internal_visit<std::shared_ptr<OpRequest> > (operand=std::shared_ptr (count 3, weak 0) 0x7fbb1144fa00, this=<synthetic pointer>) at /usr/include/boost/variant/variant.hpp:1017
#14 visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, std::shared_ptr<OpRequest> > (storage=0x7fbae7c8a8d0, visitor=<synthetic pointer>) at /usr/include/boost/variant/detail/visitation_impl.hpp:130
#15 visitation_impl_invoke<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, std::shared_ptr<OpRequest>, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub>::has_fallback_type_> (internal_which=<optimized out>, t=0x0, storage=0x7fbae7c8a8d0, visitor=<synthetic pointer>)
    at /usr/include/boost/variant/detail/visitation_impl.hpp:173
#16 visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<3l>, std::shared_ptr<OpRequest>, boost::mpl::l_item<mpl_::long_<2l>, PGSnapTrim, boost::mpl::l_item<mpl_::long_<1l>, PGScrub, boost::mpl::l_end> > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub>::has_fallback_type_> (no_backup_flag=..., storage=0x7fbae7c8a8d0, visitor=<synthetic pointer>, logical_which=<optimized out>, internal_which=<optimized out>)
    at /usr/include/boost/variant/detail/visitation_impl.hpp:256
#17 internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*> (storage=0x7fbae7c8a8d0, visitor=<synthetic pointer>, logical_which=<optimized out>, internal_which=<optimized out>) at /usr/include/boost/variant/variant.hpp:2326
#18 internal_apply_visitor<boost::detail::variant::invoke_visitor<PGQueueable::RunVis> > (visitor=<synthetic pointer>, this=0x7fbae7c8a8c8) at /usr/include/boost/variant/variant.hpp:2337
#19 apply_visitor<PGQueueable::RunVis> (visitor=..., this=0x7fbae7c8a8c8) at /usr/include/boost/variant/variant.hpp:2360
#20 apply_visitor<PGQueueable::RunVis, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub> > (visitable=..., visitor=...) at /usr/include/boost/variant/detail/apply_visitor_unary.hpp:60
#21 run (handle=..., pg=..., osd=<optimized out>, this=0x7fbae7c8a8c8) at osd/OSD.h:392
#22 OSD::ShardedOpWQ::_process (this=0x7fbb1114b1e8, thread_index=<optimized out>, hb=<optimized out>) at osd/OSD.cc:8696
---Type <return> to continue, or q <return> to quit---
#23 0x00007fbb060f36e7 in ShardedThreadPool::shardedthreadpool_worker (this=0x7fbb1114a768, thread_index=<optimized out>) at common/WorkQueue.cc:340
#24 0x00007fbb060f5650 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:684
#25 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#26 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#27 0x0000000000000000 in ?? ()

#0  0x00007fbb03e20a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fbb05a7a6e5 in WaitUntil (when=..., mutex=..., this=0x7fbb10edc7d8) at common/Cond.h:72
#2  WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x7fbb10edc7d8) at common/Cond.h:81
#3  OSD::ShardedOpWQ::_process (this=0x7fbb1114b1e8, thread_index=<optimized out>, hb=0x7fbb11193660) at osd/OSD.cc:8644
#4  0x00007fbb060f36e7 in ShardedThreadPool::shardedthreadpool_worker (this=0x7fbb1114a768, thread_index=<optimized out>) at common/WorkQueue.cc:340
#5  0x00007fbb060f5650 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:684
#6  0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()

Thread 61 (Thread 0x7fbadd96e700 (LWP 21309)):
#0  0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#2  0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb13d0d170, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007fbb060d85d6 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115
#5  is_connected (this=0x7fbb13d0c000) at msg/simple/Pipe.h:143
#6  SimpleMessenger::is_connected (this=<optimized out>, con=<optimized out>) at msg/simple/SimpleMessenger.cc:282
#7  0x00007fbb05a4dc06 in OSD::op_is_discardable (op=op@entry=0x7fbb12214dc0) at osd/OSD.cc:8616
#8  0x00007fbb05a82e9a in OSD::handle_op (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 2, weak 0) 0x7fbb1144f800, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:8466
#9  0x00007fbb05a840ce in OSD::dispatch_op_fast (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 2, weak 0) 0x7fbb1144f800, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:6135
#10 0x00007fbb05a843e8 in OSD::dispatch_session_waiting (this=this@entry=0x7fbb1114a000, session=session@entry=0x7fbb1345d260, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:5771
#11 0x00007fbb05a84794 in OSD::ms_fast_dispatch (this=0x7fbb1114a000, m=<optimized out>) at osd/OSD.cc:5883
#12 0x00007fbb061c3cf6 in ms_fast_dispatch (m=0x7fbb12214dc0, this=0x7fbb10fa4000) at msg/Messenger.h:555
#13 DispatchQueue::fast_dispatch (this=0x7fbb10fa41d0, m=m@entry=0x7fbb12214dc0) at msg/simple/DispatchQueue.cc:71
#14 0x00007fbb061f6372 in Pipe::DelayedDelivery::entry (this=0x7fbb11733560) at msg/simple/Pipe.cc:226
#15 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()

Thread 45 (Thread 0x7fbadab63700 (LWP 21310)):
#0  0x00007fbb03e206d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fbb061ed78f in Wait (mutex=..., this=0x7fbb11733638) at common/Cond.h:56
#2  Pipe::DelayedDelivery::stop_fast_dispatching (this=0x7fbb11733560) at msg/simple/Pipe.cc:249
#3  0x00007fbb061ee99f in Pipe::stop_and_wait (this=this@entry=0x7fbb13d0c000) at msg/simple/Pipe.cc:1502
#4  0x00007fbb061fcd3f in Pipe::accept (this=this@entry=0x7fbb117be800) at msg/simple/Pipe.cc:696
#5  0x00007fbb062007cf in Pipe::reader (this=0x7fbb117be800) at msg/simple/Pipe.cc:1517
#6  0x00007fbb062084bd in Pipe::Reader::entry (this=<optimized out>) at msg/simple/Pipe.h:52
#7  0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()

Thread 43 (Thread 0x7fbaef49b700 (LWP 17003)):
#0  0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7ffd68175960, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007fbb05a63e58 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115
#5  sub_want_increment (flags=1, start=35, 
    what="\230vR\023\273\177\000\000\000\341\335\177\206\261\006\026ЍI\357\272\177\000\000\000\240\024\021\273\177\000\000\300\215I\357\272\177\000\000\030\263\024\021\273\177\000\000#\000\000\000\000\000\000\000ЍI\357\272\177\000\000\200%\367\022\273\177\000\000\254@\246\005\273\177\000\000#\000\000\000\000\000\000\000۪\004\006\273\177\000\000؍I\357\272\177\000\000\000\341\335\177\206\261\006\026\000\240\024\021\273\177\000\000@\225I\357\272\177\000\000\"\000\000\000\000\000\000\000\277A\246\005\273\177\000\000\200\212\031\021\273\177\000\000@\301\355\020\273\177\000\000@\301\355\020\273\177\000\000\312V\321\002\"\000\000\000\000(H\021\273\177\000\000\340u\330\022\273\177\000\000\370\216\367\002\273\177\000\000\060o\367\002"...<Address 0x7fbaf0ca0792 out of bounds>, this=0x7ffd681757d0) at mon/MonClient.h:280
#6  OSD::osdmap_subscribe (this=this@entry=0x7fbb1114a000, epoch=35, force_request=force_request@entry=false) at osd/OSD.cc:6535
#7  0x00007fbb05a640ac in OSD::wait_for_new_map (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 4, weak 0) 0x7fbb11482800) at osd/OSD.cc:6468
#8  0x00007fbb05a641bf in OSD::require_same_or_newer_map (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 4, weak 0) 0x7fbb11482800, epoch=35, is_fast_dispatch=is_fast_dispatch@entry=false) at osd/OSD.cc:7370
---Type <return> to continue, or q <return> to quit---
#9  0x00007fbb05a9a19c in OSD::handle_pg_notify (this=0x7fbb1114a000, op=std::shared_ptr (count 4, weak 0) 0x7fbb11482800) at osd/OSD.cc:7766
#10 0x00007fbb05a9b730 in OSD::dispatch_op (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 4, weak 0) 0x7fbb11482800) at osd/OSD.cc:6078
#11 0x00007fbb05a9c35e in OSD::_dispatch (this=this@entry=0x7fbb1114a000, m=m@entry=0x7fbb12f72580) at osd/OSD.cc:6252
#12 0x00007fbb05a9ca1f in OSD::ms_dispatch (this=0x7fbb1114a000, m=0x7fbb12f72580) at osd/OSD.cc:5755
#13 0x00007fbb061c5b9a in ms_deliver_dispatch (m=0x7fbb12f72580, this=0x7fbb10fa4700) at msg/Messenger.h:584
#14 DispatchQueue::entry (this=0x7fbb10fa48d0) at msg/simple/DispatchQueue.cc:185
#15 0x00007fbb060def3d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/simple/DispatchQueue.h:103
#16 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#18 0x0000000000000000 in ?? ()

Thread 42 (Thread 0x7fbadef76700 (LWP 17269)):
#0  0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7ffd68175960, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007fbb05a63e58 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115
#5  sub_want_increment (flags=1, start=35, 
    what="\230~\330\022\273\177\000\000\000\341\335\177\206\261\006\026\000\240\024\021\273\177\000\000@\301\355\020\273\177\000\000@Z\376\021\273\177\000\000\360C\367\336\272\177\000\000\000\240\024\021\273\177\000\000\060\256\024\021\273\177\000\000\240Fk\021\273\177\000\000\352s\247\005\273\177\000\000\200\332o\022\273\177\000\000\000\240\355\022\273\177\000\000\220D\367\336\272\177\000\000\060\000\000\000\273\177\000\000\314C\367\336\272\177\000\000\200|/\022\273\177\000\000P\200#\021\273\177\000\000\334Ӵ\021\273\177\000\000X\200#\021\273\177\000\000H\200#\021\273\177\000\000@\200#\021\273\177\000\000\312V\321\002\002", '\000' <repeats 15 times>, "\002\000\000\000\000\204.\021\273\177\000\000\242U\321\002"...<Address 0x7fbadf97a000 out of bounds>, this=0x7ffd681757d0) at mon/MonClient.h:280
#6  OSD::osdmap_subscribe (this=this@entry=0x7fbb1114a000, epoch=35, force_request=force_request@entry=false) at osd/OSD.cc:6535
#7  0x00007fbb05a773ea in OSD::handle_osd_ping (this=this@entry=0x7fbb1114a000, m=m@entry=0x7fbb12eda000) at osd/OSD.cc:4001
#8  0x00007fbb05a7866b in OSD::heartbeat_dispatch (this=0x7fbb1114a000, m=0x7fbb12eda000) at osd/OSD.cc:5713
#9  0x00007fbb06213d9b in ms_fast_dispatch (m=<optimized out>, this=0x7fbb10fac000) at msg/Messenger.h:555
#10 AsyncConnection::DelayedDelivery::do_request (this=<optimized out>, id=<optimized out>) at msg/async/AsyncConnection.cc:2453
#11 0x00007fbb061c8626 in EventCenter::process_time_events (this=this@entry=0x7fbb10faa848) at msg/async/Event.cc:307
#12 0x00007fbb061c9d78 in EventCenter::process_events (this=this@entry=0x7fbb10faa848, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000) at msg/async/Event.cc:392
#13 0x00007fbb061a90e0 in Worker::entry (this=0x7fbb10faa800) at msg/async/AsyncMessenger.cc:294
#14 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#16 0x0000000000000000 in ?? ()

Thread 38 (Thread 0x7fbae6489700 (LWP 17021)):
#0  0x00007fbb03e20a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fbb05a7a6e5 in WaitUntil (when=..., mutex=..., this=0x7fbb10edca58) at common/Cond.h:72
#2  WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x7fbb10edca58) at common/Cond.h:81
#3  OSD::ShardedOpWQ::_process (this=0x7fbb1114b1e8, thread_index=<optimized out>, hb=0x7fbb111936c0) at osd/OSD.cc:8644
#4  0x00007fbb060f36e7 in ShardedThreadPool::shardedthreadpool_worker (this=0x7fbb1114a768, thread_index=<optimized out>) at common/WorkQueue.cc:340
#5  0x00007fbb060f5650 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:684
#6  0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()

Thread 28 (Thread 0x7fbaea491700 (LWP 17013)):
#0  0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb1142d528, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007fbb05b0df83 in PG::lock (this=this@entry=0x7fbb1142d000, no_lockdep=no_lockdep@entry=false) at osd/PG.cc:265
#5  0x00007fbb05b0e11e in PG::lock_suspend_timeout (this=0x7fbb1142d000, handle=...) at osd/PG.cc:259
#6  0x00007fbb05a7a869 in OSD::ShardedOpWQ::_process (this=0x7fbb1114b1e8, thread_index=<optimized out>, hb=0x7fbb11193510) at osd/OSD.cc:8658
#7  0x00007fbb060f36e7 in ShardedThreadPool::shardedthreadpool_worker (this=0x7fbb1114a768, thread_index=<optimized out>) at common/WorkQueue.cc:340
#8  0x00007fbb060f5650 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:684
#9  0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()

Thread 16 (Thread 0x7fbafa647700 (LWP 16951)):
#0  0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb10fa44b0, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007fbb060d8b04 in SimpleMessenger::mark_down (this=0x7fbb10fa4000, con=0x7fbb10f4a880) at msg/simple/SimpleMessenger.cc:655
#5  0x00007fbb060a4e4f in MonClient::_reopen_session (this=this@entry=0x7ffd681757d0, rank=rank@entry=-1, name="") at mon/MonClient.cc:617
#6  0x00007fbb05a6f9bb in _reopen_session (this=0x7ffd681757d0) at mon/MonClient.h:191
#7  reopen_session (cb=0x0, this=0x7ffd681757d0) at mon/MonClient.h:346
#8  OSD::tick_without_osd_lock (this=0x7fbb1114a000) at osd/OSD.cc:4298
#9  0x00007fbb05aa3e29 in Context::complete (this=0x7fbb10eac3b0, r=<optimized out>) at include/Context.h:64
#10 0x00007fbb060ec304 in SafeTimer::timer_thread (this=0x7fbb1114a190) at common/Timer.cc:105
#11 0x00007fbb060edd3d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#12 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()

Thread 13 (Thread 0x7fbaf0c9e700 (LWP 17000)):
#0  0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb1345d340, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007fbb05a55665 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115
#5  session_handle_reset (session=0x7fbb1345d260, this=0x7fbb1114a000) at osd/OSD.h:1438
#6  OSD::ms_handle_reset (this=0x7fbb1114a000, con=0x7fbb13187680) at osd/OSD.cc:4687
#7  0x00007fbb061c560f in ms_deliver_handle_reset (con=0x7fbb13187680, this=0x7fbb10fa4000) at msg/Messenger.h:657
#8  DispatchQueue::entry (this=0x7fbb10fa41d0) at msg/simple/DispatchQueue.cc:173
#9  0x00007fbb060def3d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/simple/DispatchQueue.h:103
#10 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7fbafe64f700 (LWP 16943)):
#0  0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7ffd68175960, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007fbb05cb7afa in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115
#5  sub_want (flags=1, start=35, what="", this=0x7ffd681757d0) at mon/MonClient.h:264
#6  Objecter::_maybe_request_map (this=this@entry=0x7fbb11158000) at osdc/Objecter.cc:1860
#7  0x00007fbb05cbf386 in Objecter::tick (this=0x7fbb11158000) at osdc/Objecter.cc:2056
#8  0x00007fbb05ae26d1 in operator() (this=0x7fbb10ee6850) at /usr/include/c++/4.8.2/functional:2471
#9  ceph::timer_detail::timer<ceph::time_detail::mono_clock>::timer_thread (this=0x7fbb111581f8) at common/ceph_timer.h:137
#10 0x00007fbb02d40220 in ?? () from /lib64/libstdc++.so.6
#11 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7fbafce4c700 (LWP 16946)):
#0  0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb1114ae30, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007fbb05a76f6e in OSD::handle_osd_ping (this=this@entry=0x7fbb1114a000, m=m@entry=0x7fbb111ffc00) at osd/OSD.cc:3878
#5  0x00007fbb05a7866b in OSD::heartbeat_dispatch (this=0x7fbb1114a000, m=0x7fbb111ffc00) at osd/OSD.cc:5713
#6  0x00007fbb06213d9b in ms_fast_dispatch (m=<optimized out>, this=0x7fbb10fac000) at msg/Messenger.h:555
#7  AsyncConnection::DelayedDelivery::do_request (this=<optimized out>, id=<optimized out>) at msg/async/AsyncConnection.cc:2453
#8  0x00007fbb061c8626 in EventCenter::process_time_events (this=this@entry=0x7fbb10fa87c8) at msg/async/Event.cc:307
#9  0x00007fbb061c9d78 in EventCenter::process_events (this=this@entry=0x7fbb10fa87c8, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000) at msg/async/Event.cc:392
#10 0x00007fbb061a90e0 in Worker::entry (this=0x7fbb10fa8780) at msg/async/AsyncMessenger.cc:294
#11 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7fbafd64d700 (LWP 16945)):
#0  0x00007fbb03e22f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb1114ae30, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007fbb05a76f6e in OSD::handle_osd_ping (this=this@entry=0x7fbb1114a000, m=m@entry=0x7fbb113b7600) at osd/OSD.cc:3878
---Type <return> to continue, or q <return> to quit---
#5  0x00007fbb05a7866b in OSD::heartbeat_dispatch (this=0x7fbb1114a000, m=0x7fbb113b7600) at osd/OSD.cc:5713
#6  0x00007fbb06213d9b in ms_fast_dispatch (m=<optimized out>, this=0x7fbb10fac000) at msg/Messenger.h:555
#7  AsyncConnection::DelayedDelivery::do_request (this=<optimized out>, id=<optimized out>) at msg/async/AsyncConnection.cc:2453
#8  0x00007fbb061c8626 in EventCenter::process_time_events (this=this@entry=0x7fbb10fa8548) at msg/async/Event.cc:307
#9  0x00007fbb061c9d78 in EventCenter::process_events (this=this@entry=0x7fbb10fa8548, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000) at msg/async/Event.cc:392
#10 0x00007fbb061a90e0 in Worker::entry (this=0x7fbb10fa8500) at msg/async/AsyncMessenger.cc:294
#11 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()


/a/sage-2016-11-25_00:58:22-rados:thrash-wip-prune-past-intervals-jewel---basic-smithi/576078

bt.txt View (65.6 KB) Sage Weil, 11/25/2016 02:36 PM

History

#1 Updated by Sage Weil about 6 years ago

#2 Updated by Samuel Just about 6 years ago

Thread 45 (Thread 0x7fbadab63700 (LWP 21310)):
#0 0x00007fbb03e206d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fbb061ed78f in Wait (mutex=..., this=0x7fbb11733638) at common/Cond.h:56
#2 Pipe::DelayedDelivery::stop_fast_dispatching (this=0x7fbb11733560) at msg/simple/Pipe.cc:249
#3 0x00007fbb061ee99f in Pipe::stop_and_wait (this=this@entry=0x7fbb13d0c000) at msg/simple/Pipe.cc:1502
#4 0x00007fbb061fcd3f in Pipe::accept (this=this@entry=0x7fbb117be800) at msg/simple/Pipe.cc:696
#5 0x00007fbb062007cf in Pipe::reader (this=0x7fbb117be800) at msg/simple/Pipe.cc:1517
#6 0x00007fbb062084bd in Pipe::Reader::entry (this=<optimized out>) at msg/simple/Pipe.h:52
#7 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()

Thread 45 is accepting a replacement for Pipe 0x7fbb13d0c000. Pipe::accept when it calls into stop_and_wait is holding both the new and existing pipe locks. stop_and_wait calls into DelayedDelivery::stop_fast_dispatching while holding the pipe lock. stop_fast_dispatching blocks until the delay thread stops while holding the pipe_lock. The corresponding DelayedDelivery object is 0x7fbb11733560.

Thread 61 (Thread 0x7fbadd96e700 (LWP 21309)):
#0 0x00007fbb03e22f4d in _lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fbb03e1ed02 in _L_lock_791 () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#2 0x00007fbb03e1ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fbb060b47d8 in Mutex::Lock (this=this@entry=0x7fbb13d0d170, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4 0x00007fbb060d85d6 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115
#5 is_connected (this=0x7fbb13d0c000) at msg/simple/Pipe.h:143
#6 SimpleMessenger::is_connected (this=<optimized out>, con=<optimized out>) at msg/simple/SimpleMessenger.cc:282
#7 0x00007fbb05a4dc06 in OSD::op_is_discardable (op=op@entry=0x7fbb12214dc0) at osd/OSD.cc:8616
#8 0x00007fbb05a82e9a in OSD::handle_op (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 2, weak 0) 0x7fbb1144f800, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (
_gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (_gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:8466
#9 0x00007fbb05a840ce in OSD::dispatch_op_fast (this=this@entry=0x7fbb1114a000, op=std::shared_ptr (count 2, weak 0) 0x7fbb1144f800, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (
_gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (_gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:6135
#10 0x00007fbb05a843e8 in OSD::dispatch_session_waiting (this=this@entry=0x7fbb1114a000, session=session@entry=0x7fbb1345d260, osdmap=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (
_gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const, std::less<unsigned int>, std::hash<unsigned int> >::Cleanup, std::allocator<int>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 31, weak 1) 0x7fbb126fda80) at osd/OSD.cc:5771
#11 0x00007fbb05a84794 in OSD::ms_fast_dispatch (this=0x7fbb1114a000, m=<optimized out>) at osd/OSD.cc:5883
#12 0x00007fbb061c3cf6 in ms_fast_dispatch (m=0x7fbb12214dc0, this=0x7fbb10fa4000) at msg/Messenger.h:555
#13 DispatchQueue::fast_dispatch (this=0x7fbb10fa41d0, m=m@entry=0x7fbb12214dc0) at msg/simple/DispatchQueue.cc:71
#14 0x00007fbb061f6372 in Pipe::DelayedDelivery::entry (this=0x7fbb11733560) at msg/simple/Pipe.cc:226
#15 0x00007fbb03e1cdc5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fbb024a828d in ?? () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()

Thread 61 has the DelayedDelivery object 0x7fbb11733560 calling through the fast_dispatch path until it hits all the way down to op_is_discardable which calls is_connected on our friend Pipe 0x7fbb13d0c000 which thread 45 has locked completing the cycle.

#3 Updated by Samuel Just about 6 years ago

  • Subject changed from jewel: osd: lock cycle / deadlock to jewel: SimpleMessenger debug delay injection deadlock
  • Priority changed from Immediate to High

I don't think this is a recent regression, demoting to high for now.

#4 Updated by Samuel Just about 6 years ago

  • Subject changed from jewel: SimpleMessenger debug delay injection deadlock to SimpleMessenger debug delay injection deadlock

The same conditions appear to be present in master.

#5 Updated by Samuel Just about 6 years ago

  • Status changed from New to 7
  • Assignee set to Samuel Just

I think I can just unlock the pipe_lock there since we unlock it waiting on a cond just below there.

#6 Updated by Samuel Just about 6 years ago

  • Backport set to jewel

#7 Updated by Samuel Just almost 6 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF