Bug #38537
mgr deadlock
% Done:
0%
Source:
Tags:
Backport:
luminous, mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Thread 45 (Thread 0x7fbfa869b700 (LWP 1914003)): #0 0x00007fbfd1667827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x149cf80) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 #1 do_futex_wait (sem=sem@entry=0x149cf80, abstime=0x0) at sem_waitcommon.c:111 #2 0x00007fbfd16678d4 in __new_sem_wait_slow (sem=0x149cf80, abstime=0x0) at sem_waitcommon.c:181 #3 0x00007fbfd166797a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29 #4 0x00007fbfd1bbafe8 in PyThread_acquire_lock () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #5 0x00007fbfd1b8f926 in PyEval_RestoreThread () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #6 0x0000000000507408 in ActivePyModules::<lambda(const OSDMap&, const PGMap&)>::operator() (__closure=<optimized out>, __closure=<optimized out>, pg_map=..., osd_map=...) at /build/ceph-14.1.0-101-gdddb858/src/mgr/ActivePyModules.cc:333 #7 Objecter::with_osdmap<ActivePyModules::get_python(const string&)::<lambda(const OSDMap&, const PGMap&)>, const PGMap&> (cb=<optimized out>, this=<optimized out>) at /build/ceph-14.1.0-101-gdddb858/src/osdc/Objecter.h:2056 #8 ClusterState::with_osdmap_and_pgmap<ActivePyModules::get_python(const string&)::<lambda(const OSDMap&, const PGMap&)> > (cb=<optimized out>, this=0x4268368) at /build/ceph-14.1.0-101-gdddb858/src/mgr/ClusterState.h:138 #9 ActivePyModules::get_python (this=this@entry=0x1307de0, what=...) at /build/ceph-14.1.0-101-gdddb858/src/mgr/ActivePyModules.cc:329 #10 0x00000000005156e7 in ceph_state_get (self=<optimized out>, args=<optimized out>) at /build/ceph-14.1.0-101-gdddb858/src/mgr/BaseMgrModule.cc:344 #11 0x00007fbfd1b98971 in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #12 0x00007fbfd1b97044 in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #13 0x00007fbfd1b97044 in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #14 0x00007fbfd1b97044 in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
i.e., it holds ClusterState locks, acquiring GIL
Thread 26 (Thread 0x7fbfba1f9700 (LWP 1913869)): #0 0x00007fbfd1664730 in futex_wait (private=<optimized out>, expected=2, futex_word=0x7ffebddb3acc) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=2, futex_word=0x7ffebddb3acc) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_rwlock_wrlock_slow (rwlock=0x7ffebddb3ac0) at pthread_rwlock_wrlock.c:67 #3 0x00007fbfd1664918 in __GI___pthread_rwlock_wrlock (rwlock=<optimized out>) at pthread_rwlock_wrlock.c:124 #4 0x00000000005d41b6 in std::__shared_mutex_pthread::lock (this=<optimized out>) at /usr/include/c++/7/shared_mutex:103 ---Type <return> to continue, or q <return> to quit--- #5 std::shared_mutex::lock (this=<optimized out>) at /usr/include/c++/7/shared_mutex:329 #6 ceph::shunique_lock<std::shared_mutex>::lock (this=0x7fbfba1f5880) at /build/ceph-14.1.0-101-gdddb858/src/common/shunique_lock.h:157 #7 ceph::shunique_lock<std::shared_mutex>::shunique_lock (m=..., this=0x7fbfba1f5880) at /build/ceph-14.1.0-101-gdddb858/src/common/shunique_lock.h:65 #8 Objecter::submit_command (this=this@entry=0x7ffebddb39d8, c=c@entry=0xc234580, ptid=ptid@entry=0x7fbfba1f59b0) at /build/ceph-14.1.0-101-gdddb858/src/osdc/Objecter.cc:4751 #9 0x0000000000516d12 in Objecter::osd_command (onfinish=0x88d87e0, prs=<optimized out>, poutbl=0x88d8848, ptid=0x7fbfba1f59b0, inbl=..., cmd=..., osd=64, this=0x7ffebddb39d8) at /build/ceph-14.1.0-101-gdddb858/src/osdc/Objecter.h:2224 #10 ceph_send_command (self=<optimized out>, args=<optimized out>) at /build/ceph-14.1.0-101-gdddb858/src/mgr/BaseMgrModule.cc:178 #11 0x00007fbfd1b97772 in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #12 0x00007fbfd1cce05c in PyEval_EvalCodeEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #13 0x00007fbfd1b96f1d in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #14 0x00007fbfd1cce05c in PyEval_EvalCodeEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #15 0x00007fbfd1b96f1d in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #16 0x00007fbfd1b97044 in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #17 0x00007fbfd1cce05c in PyEval_EvalCodeEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #18 0x00007fbfd1c24370 in ?? () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #19 0x00007fbfd1bf7273 in PyObject_Call () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #20 0x00007fbfd1c6b3ac in ?? () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #21 0x00007fbfd1bf7273 in PyObject_Call () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #22 0x00007fbfd1bf8444 in PyObject_CallMethod () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 #23 0x00000000005ad99b in PyModuleRunner::serve (this=0x4472180) at /build/ceph-14.1.0-101-gdddb858/src/mgr/PyModuleRunner.cc:47 #24 0x00000000005adff5 in PyModuleRunner::PyModuleRunnerThread::entry (this=0x44721c8) at /build/ceph-14.1.0-101-gdddb858/src/mgr/PyModuleRunner.cc:106 #25 0x00007fbfd165f6ba in start_thread (arg=0x7fbfba1f9700) at pthread_create.c:333 #26 0x00007fbfd0e8841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
holds GIL, acquiring objecter rwlock!
(gdb) p rwlock $1 = {_M_impl = {_M_rwlock = {__data = {__lock = 0, __nr_readers = 1, __readers_wakeup = 3, __writer_wakeup = 2, __nr_readers_queued = 0, __nr_writers_queued = 1, __writer = 0, __shared = 0, __rwelision = 1 '\001', __pad1 = "\000\000\000\000\000\000", __pad2 = 0, __flags = 0}, __size = "\000\000\000\000\001\000\000\000\003\000\000\000\002\000\000\000\000\000\000\000\001", '\000' <repeats 11 times>, "\001", '\000' <repeats 22 times>, __align = 4294967296}}}
...
Thread 24 (Thread 0x7fbfbb9fc700 (LWP 1913866)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fbfd1661dbd in __GI___pthread_mutex_lock (mutex=0x4268568) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007fbfd23cf7b9 in Mutex::lock(bool) () from target:/usr/lib/ceph/libceph-common.so.0 #3 0x000000000055f741 in std::lock_guard<Mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/include/c++/7/bits/std_mutex.h:162 #4 ClusterState::with_mutable_pgmap<DaemonServer::send_report()::<lambda(PGMap&)> > (cb=<optimized out>, this=0x4268368) at /build/ceph-14.1.0-101-gdddb858/src/mgr/ClusterState.h:110 #5 DaemonServer::send_report (this=this@entry=0x4269238) at /build/ceph-14.1.0-101-gdddb858/src/mgr/DaemonServer.cc:2289 #6 0x0000000000560ebf in DaemonServer::tick (this=0x4269238) at /build/ceph-14.1.0-101-gdddb858/src/mgr/DaemonServer.cc:323 #7 0x00000000005116c9 in boost::function1<void, int>::operator() (a0=<optimized out>, this=<optimized out>) at /build/ceph-14.1.0-101-gdddb858/obj-x86_64-linux-gnu/boost/include/boost/function/function_template.hpp:768 #8 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /build/ceph-14.1.0-101-gdddb858/src/include/Context.h:487 #9 0x000000000050e4e9 in Context::complete (this=0x16245240, r=<optimized out>) at /build/ceph-14.1.0-101-gdddb858/src/include/Context.h:77 #10 0x00007fbfd23e6420 in SafeTimer::timer_thread() () from target:/usr/lib/ceph/libceph-common.so.0 #11 0x00007fbfd23e7ced in SafeTimerThread::entry() () from target:/usr/lib/ceph/libceph-common.so.0 #12 0x00007fbfd165f6ba in start_thread (arg=0x7fbfbb9fc700) at pthread_create.c:333
holds ???, acquiring clusterstate lock
Thread 22 (Thread 0x7fbfbc9fe700 (LWP 1913864)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fbfd1661dbd in __GI___pthread_mutex_lock (mutex=0x4268568) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007fbfd23cf7b9 in Mutex::lock(bool) () from target:/usr/lib/ceph/libceph-common.so.0 #3 0x0000000000532b87 in std::lock_guard<Mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/include/c++/7/bits/std_mutex.h:162 #4 ClusterState::ingest_pgstats (this=0x4268368, stats=0x12518340) at /build/ceph-14.1.0-101-gdddb858/src/mgr/ClusterState.cc:69 #5 0x0000000000560e1c in DaemonServer::ms_dispatch (this=0x4269238, m=0x12518340) at /build/ceph-14.1.0-101-gdddb858/src/mgr/DaemonServer.cc:266 #6 0x0000000000574c06 in Dispatcher::ms_dispatch2 (this=0x4269238, m=...) at /build/ceph-14.1.0-101-gdddb858/src/msg/Dispatcher.h:126 #7 0x00007fbfd2570809 in DispatchQueue::entry() () from target:/usr/lib/ceph/libceph-common.so.0 #8 0x00007fbfd261f84d in DispatchQueue::DispatchThread::entry() () from target:/usr/lib/ceph/libceph-common.so.0 #9 0x00007fbfd165f6ba in start_thread (arg=0x7fbfbc9fe700) at pthread_create.c:333 #10 0x00007fbfd0e8841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
holds ??, acquiring clsuterstate
Thread 12 (Thread 0x7fbfc7b58700 (LWP 1913663)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007fbfd292965c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from target:/usr/lib/ceph/libceph-common.so.0 #2 0x00007fbfd239c9c5 in Finisher::finisher_thread_entry() () from target:/usr/lib/ceph/libceph-common.so.0 #3 0x00007fbfd165f6ba in start_thread (arg=0x7fbfc7b58700) at pthread_create.c:333 #4 0x00007fbfd0e8841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 #2 0x00007fbfd23cf7b9 in Mutex::lock(bool) () from target:/usr/lib/ceph/libceph-common.so.0 #3 0x000000000057c2ec in std::lock_guard<Mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/include/c++/7/bits/std_mutex.h:162 #4 Mgr::get_services[abi:cxx11]() const (this=0x4268000) at /build/ceph-14.1.0-101-gdddb858/src/mgr/Mgr.cc:686 #5 0x000000000058ae14 in MgrStandby::send_beacon (this=this@entry=0x7ffebddb3270) at /build/ceph-14.1.0-101-gdddb858/src/mgr/MgrStandby.cc:244 #6 0x000000000058b362 in MgrStandby::tick (this=0x7ffebddb3270) at /build/ceph-14.1.0-101-gdddb858/src/mgr/MgrStandby.cc:253 #7 0x00000000005116c9 in boost::function1<void, int>::operator() (a0=<optimized out>, this=<optimized out>) at /build/ceph-14.1.0-101-gdddb858/obj-x86_64-linux-gnu/boost/include/boost/function/function_template.hpp:768 #8 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /build/ceph-14.1.0-101-gdddb858/src/include/Context.h:487 #9 0x000000000050e4e9 in Context::complete (this=0x1bbbfd10, r=<optimized out>) at /build/ceph-14.1.0-101-gdddb858/src/include/Context.h:77 #10 0x00007fbfd23e6420 in SafeTimer::timer_thread() () from target:/usr/lib/ceph/libceph-common.so.0 #11 0x00007fbfd23e7ced in SafeTimerThread::entry() () from target:/usr/lib/ceph/libceph-common.so.0 <pre> blocking trying to take mgr lock <pre> Thread 13 (Thread 0x7fbfc7357700 (LWP 1913664)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fbfd1661dbd in __GI___pthread_mutex_lock (mutex=0x4268568) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007fbfd23cf7b9 in Mutex::lock(bool) () from target:/usr/lib/ceph/libceph-common.so.0 #3 0x0000000000532a7b in std::lock_guard<Mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/include/c++/7/bits/std_mutex.h:162 #4 ClusterState::set_service_map (this=0x4268368, new_service_map=...) at /build/ceph-14.1.0-101-gdddb858/src/mgr/ClusterState.cc:57 #5 0x0000000000582443 in Mgr::handle_service_map (this=this@entry=0x4268000, m=m@entry=0x4215400) at /build/ceph-14.1.0-101-gdddb858/src/mgr/Mgr.cc:509 #6 0x00000000005844fb in Mgr::ms_dispatch (this=this@entry=0x4268000, m=m@entry=0x4215400) at /build/ceph-14.1.0-101-gdddb858/src/mgr/Mgr.cc:556 #7 0x000000000058ccbe in MgrStandby::ms_dispatch (this=0x7ffebddb3270, m=0x4215400) at /build/ceph-14.1.0-101-gdddb858/src/mgr/MgrStandby.cc:436 #8 0x0000000000574c06 in Dispatcher::ms_dispatch2 (this=0x7ffebddb3270, m=...) at /build/ceph-14.1.0-101-gdddb858/src/msg/Dispatcher.h:126 #9 0x00007fbfd2570809 in DispatchQueue::entry() () from target:/usr/lib/ceph/libceph-common.so.0 #10 0x00007fbfd261f84d in DispatchQueue::DispatchThread::entry() () from target:/usr/lib/ceph/libceph-common.so.0 #11 0x00007fbfd165f6ba in start_thread (arg=0x7fbfc7357700) at pthread_create.c:333 </pre>
Related issues
History
#1 Updated by Sage Weil about 5 years ago
- Status changed from In Progress to Fix Under Review
#2 Updated by Kefu Chai about 5 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to luminous, mimic
#3 Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38561: mimic: mgr deadlock added
#4 Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38562: luminous: mgr deadlock added
#5 Updated by Nathan Cutler almost 5 years ago
- Status changed from Pending Backport to Resolved