Project

General

Profile

Bug #39040

mgr: deadlock

Added by xie xingguo 5 months ago. Updated 23 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
luminous,mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Thread 4 (Thread 0x7fa7b1350700 (LWP 2832129)):
#0 0x00007fa7d501a945 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00005583867503a0 in Wait (mutex=..., this=0x7fa7b134ddc8) at /usr/src/debug/ceph-12.2.9/src/common/Cond.h:48
#2 C_SaferCond::wait (this=0x7fa7b134dd68) at /usr/src/debug/ceph-12.2.9/src/common/Cond.h:194
#3 0x000055838674e39b in wait (this=0x7fa7b134dd60) at /usr/src/debug/ceph-12.2.9/src/mgr/MgrContext.h:39
#4 ActivePyModules::set_config (this=0x558390ba4400, module_name=..., key="active", val=...) at /usr/src/debug/ceph-12.2.9/src/mgr/ActivePyModules.cc:494
#5 0x000055838676f5bb in ceph_config_set (self=0x558391d800f0, args=<optimized out>) at /usr/src/debug/ceph-12.2.9/src/mgr/BaseMgrModule.cc:396
#6 0x00007fa7d6e03bb0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7 0x00007fa7d6e0357d in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#8 0x00007fa7d6e05efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#9 0x00007fa7d6d8f858 in function_call () from /lib64/libpython2.7.so.1.0
#10 0x00007fa7d6d6a9a3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#11 0x00007fa7d6d79995 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#12 0x00007fa7d6d6a9a3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#13 0x00007fa7d6d6aa85 in call_function_tail () from /lib64/libpython2.7.so.1.0
#14 0x00007fa7d6d6adbb in PyObject_CallMethod () from /lib64/libpython2.7.so.1.0
#15 0x000055838676683c in PyModuleRunner::serve (this=0x558390b81000) at /usr/src/debug/ceph-12.2.9/src/mgr/PyModuleRunner.cc:51
#16 0x00005583867670ce in PyModuleRunner::PyModuleRunnerThread::entry (this=0x558390b81038) at /usr/src/debug/ceph-12.2.9/src/mgr/PyModuleRunner.cc:112
#17 0x00007fa7d5016e25 in start_thread () from /lib64/libpthread.so.0
#18 0x00007fa7d430f36d in clone () from /lib64/libc.so.6@

Thread 8 (Thread 0x7fa7b3354700 (LWP 2832125)):
#0 0x00007fa7d501ca0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
#1 0x00007fa7d501ca9f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2 0x00007fa7d501cb3b in sem_wait
@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3 0x00007fa7d6e305f5 in PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#4 0x00007fa7d6dfc156 in PyEval_RestoreThread () from /lib64/libpython2.7.so.1.0
#5 0x0000558386792a0d in Gil::Gil (this=0x7fa7b3352ad0, ts=..., new_thread=<optimized out>) at /usr/src/debug/ceph-12.2.9/src/mgr/Gil.cc:37
#6 0x00005583867763b3 in ActivePyModule::notify_clog (this=0x558390b81000, log_entry=...) at /usr/src/debug/ceph-12.2.9/src/mgr/ActivePyModule.cc:108
#7 0x000055838673760a in operator() (a0=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-12.2.9/build/boost/include/boost/function/function_template.hpp:760
#8 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /usr/src/debug/ceph-12.2.9/src/include/Context.h:493
#9 0x0000558386732399 in Context::complete (this=0x558391bbec10, r=<optimized out>) at /usr/src/debug/ceph-12.2.9/src/include/Context.h:70
#10 0x00005583868d35d8 in Finisher::finisher_thread_entry (this=0x558391fe2140) at /usr/src/debug/ceph-12.2.9/src/common/Finisher.cc:72
#11 0x00007fa7d5016e25 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fa7d430f36d in clone () from /lib64/libc.so.6@


Related issues

Copied to mgr - Backport #39424: luminous: mgr: deadlock Resolved
Copied to mgr - Backport #39425: nautilus: mgr: deadlock Resolved
Copied to mgr - Backport #39426: mimic: mgr: deadlock Resolved

History

#1 Updated by xie xingguo 5 months ago

thread 4 is currently holding GIL and waiting for mon_command to finish, while thread 8 (mon_client's finisher thread) is blocking at acquiring the GTL, which as result causes mgr deadlock.

#2 Updated by Patrick Donnelly 5 months ago

  • Project changed from Ceph to mgr
  • Subject changed from luminous: mgr deadlock to mgr: deadlock
  • Start date deleted (03/30/2019)

This has probably been fixed in one of the recent merges. Don't know if those fixes are slated for backport though!

#3 Updated by Patrick Donnelly 5 months ago

  • Status changed from New to Need Review
  • Target version set to v15.0.0
  • Pull request ID set to 27280

#4 Updated by Kefu Chai 5 months ago

  • Status changed from Need Review to Pending Backport

#8 Updated by Sage Weil 4 months ago

  • Status changed from Pending Backport to In Progress

The original fix was buggy! Do not backport yet. See #39335

#9 Updated by Sage Weil 4 months ago

  • Related to Bug #39335: deadlock on command completion added

#10 Updated by Sage Weil 4 months ago

  • Related to deleted (Bug #39335: deadlock on command completion)

#11 Updated by Sage Weil 4 months ago

  • Status changed from In Progress to Pending Backport

Scratch that, this fix is good! I was just testing a version before this was merged and got confuuused

#12 Updated by Nathan Cutler 4 months ago

#13 Updated by Nathan Cutler 4 months ago

#14 Updated by Nathan Cutler 4 months ago

#15 Updated by Nathan Cutler 23 days ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF