Project

General

Profile

Actions

Bug #35985

closed

deadlock in standby ceph-mgr daemons

Added by John Spray over 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
ceph-mgr
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic, luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From "[ceph-users] Standby mgr stopped sending beacons after upgrade to 12.2.8"

Thread 11 (Thread 0x7fc30888d700 (LWP 224053)):
#0  0x00007fc30f2e0afb in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
#1  0x00007fc30f2e0b8f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00007fc30f2e0c2b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3  0x00007fc311275735 in PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#4  0x00007fc311241296 in PyEval_RestoreThread () from /lib64/libpython2.7.so.1.0
#5  0x00007fc31127942e in lock_PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#6  0x00007fc311248cf0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7  0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#8  0x00007fc31124853c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#9  0x00007fc3112486bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#10 0x00007fc3112486bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#12 0x00007fc31124853c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#13 0x00007fc3112486bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#14 0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#15 0x00007fc3111d4978 in function_call () from /lib64/libpython2.7.so.1.0
#16 0x00007fc3111afa63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#17 0x00007fc3111bea55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#18 0x00007fc3111afa63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#19 0x00007fc311206a87 in slot_tp_init () from /lib64/libpython2.7.so.1.0
#20 0x00007fc31120579f in type_call () from /lib64/libpython2.7.so.1.0
#21 0x00007fc3111afa63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#22 0x00007fc3112418f7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#23 0x000056446a400480 in StandbyPyModule::load (this=0x564475c03420) at /usr/src/debug/ceph-12.2.8/src/mgr/StandbyPyModules.cc:124
#24 0x000056446a40160f in StandbyPyModules::start_one (this=0x564476345340, module_name="prometheus", pClass=<optimized out>, 
    pMyThreadState=...) at /usr/src/debug/ceph-12.2.8/src/mgr/StandbyPyModules.cc:96
#25 0x000056446a405265 in PyModuleRegistry::standby_start (this=this@entry=0x7ffd3bf1e680, monc=monc@entry=0x7ffd3bf1c8c8)
    at /usr/src/debug/ceph-12.2.8/src/mgr/PyModuleRegistry.cc:321
#26 0x000056446a41a246 in MgrStandby::handle_mgr_map (this=this@entry=0x7ffd3bf1c8b0, mmap=mmap@entry=0x5644755942c0)
    at /usr/src/debug/ceph-12.2.8/src/mgr/MgrStandby.cc:361
#27 0x000056446a41ab04 in MgrStandby::ms_dispatch (this=0x7ffd3bf1c8b0, m=0x5644755942c0)
    at /usr/src/debug/ceph-12.2.8/src/mgr/MgrStandby.cc:376
#28 0x000056446a815cb2 in ms_deliver_dispatch (m=0x5644755942c0, this=0x564475332700) at /usr/src/debug/ceph-12.2.8/src/msg/Messenger.h:668
#29 DispatchQueue::entry (this=0x564475332858) at /usr/src/debug/ceph-12.2.8/src/msg/DispatchQueue.cc:197
#30 0x000056446a5ffbed in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-12.2.8/src/msg/DispatchQueue.h:101
#31 0x00007fc30f2dae25 in start_thread () from /lib64/libpthread.so.0
#32 0x00007fc30e3babad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fc2eb7dc700 (LWP 224066)):
#0  0x00007fc30f2de995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000056446a402e1e in Cond::Wait (this=0x564476345558, mutex=...) at /usr/src/debug/ceph-12.2.8/src/common/Cond.h:48
#2  0x000056446a400ab5 in with_config<StandbyPyModule::get_config(const string&, std::string*) const::__lambda8> (
    cb=<unknown type in /usr/lib/debug/usr/bin/ceph-mgr.debug, CU 0xc5a7bd, DIE 0xe5b056>, this=0x5644763453d0)
    at /usr/src/debug/ceph-12.2.8/src/mgr/StandbyPyModules.h:76
#3  StandbyPyModule::get_config (this=0x564475c036c0, key="ceph06/server_addr", value=value@entry=0x7fc2eb7d9ef0)
    at /usr/src/debug/ceph-12.2.8/src/mgr/StandbyPyModules.cc:186
#4  0x000056446a413a91 in ceph_config_get (self=0x7fc2ec096bd8, args=<optimized out>)
    at /usr/src/debug/ceph-12.2.8/src/mgr/BaseMgrStandbyModule.cc:73
#5  0x00007fc311248cf0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#6  0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#7  0x00007fc31124853c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#8  0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#9  0x00007fc31124853c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#10 0x00007fc31124b03d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#11 0x00007fc3111d4978 in function_call () from /lib64/libpython2.7.so.1.0
#12 0x00007fc3111afa63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#13 0x00007fc3111bea55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#14 0x00007fc3111afa63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#15 0x00007fc3111afb45 in call_function_tail () from /lib64/libpython2.7.so.1.0
#16 0x00007fc3111afe7b in PyObject_CallMethod () from /lib64/libpython2.7.so.1.0
#17 0x000056446a407ffc in PyModuleRunner::serve (this=0x564475c036c0) at /usr/src/debug/ceph-12.2.8/src/mgr/PyModuleRunner.cc:51
#18 0x000056446a40867f in PyModuleRunner::PyModuleRunnerThread::entry (this=0x564475c036f8)
    at /usr/src/debug/ceph-12.2.8/src/mgr/PyModuleRunner.cc:112
#19 0x00007fc30f2dae25 in start_thread () from /lib64/libpthread.so.0
#20 0x00007fc30e3babad in clone () from /lib64/libc.so.6

Related issues 5 (1 open4 closed)

Related to mgr - Cleanup #38467: Audit other functions in src/mgr/ActivePyModules.cc for thread safety in light of deadlock seen in #35985New02/25/2019

Actions
Related to mgr - Bug #39335: deadlock on command completionResolved04/16/2019

Actions
Has duplicate mgr - Bug #42086: luminous : standby mgr down after disable/enabling modules repeatlyDuplicateKefu Chai09/28/2019

Actions
Copied to mgr - Backport #38459: mimic: deadlock in standby ceph-mgr daemonsResolvedBrad HubbardActions
Copied to mgr - Backport #38460: luminous: deadlock in standby ceph-mgr daemonsResolvedBrad HubbardActions
Actions

Also available in: Atom PDF