Project

General

Profile

Actions

Bug #39040

closed

mgr: deadlock

Added by xie xingguo about 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
luminous,mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Thread 4 (Thread 0x7fa7b1350700 (LWP 2832129)):
#0 0x00007fa7d501a945 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00005583867503a0 in Wait (mutex=..., this=0x7fa7b134ddc8) at /usr/src/debug/ceph-12.2.9/src/common/Cond.h:48
#2 C_SaferCond::wait (this=0x7fa7b134dd68) at /usr/src/debug/ceph-12.2.9/src/common/Cond.h:194
#3 0x000055838674e39b in wait (this=0x7fa7b134dd60) at /usr/src/debug/ceph-12.2.9/src/mgr/MgrContext.h:39
#4 ActivePyModules::set_config (this=0x558390ba4400, module_name=..., key="active", val=...) at /usr/src/debug/ceph-12.2.9/src/mgr/ActivePyModules.cc:494
#5 0x000055838676f5bb in ceph_config_set (self=0x558391d800f0, args=<optimized out>) at /usr/src/debug/ceph-12.2.9/src/mgr/BaseMgrModule.cc:396
#6 0x00007fa7d6e03bb0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7 0x00007fa7d6e0357d in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#8 0x00007fa7d6e05efd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#9 0x00007fa7d6d8f858 in function_call () from /lib64/libpython2.7.so.1.0
#10 0x00007fa7d6d6a9a3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#11 0x00007fa7d6d79995 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#12 0x00007fa7d6d6a9a3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#13 0x00007fa7d6d6aa85 in call_function_tail () from /lib64/libpython2.7.so.1.0
#14 0x00007fa7d6d6adbb in PyObject_CallMethod () from /lib64/libpython2.7.so.1.0
#15 0x000055838676683c in PyModuleRunner::serve (this=0x558390b81000) at /usr/src/debug/ceph-12.2.9/src/mgr/PyModuleRunner.cc:51
#16 0x00005583867670ce in PyModuleRunner::PyModuleRunnerThread::entry (this=0x558390b81038) at /usr/src/debug/ceph-12.2.9/src/mgr/PyModuleRunner.cc:112
#17 0x00007fa7d5016e25 in start_thread () from /lib64/libpthread.so.0
#18 0x00007fa7d430f36d in clone () from /lib64/libc.so.6@

Thread 8 (Thread 0x7fa7b3354700 (LWP 2832125)):
#0 0x00007fa7d501ca0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
#1 0x00007fa7d501ca9f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2 0x00007fa7d501cb3b in sem_wait
@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3 0x00007fa7d6e305f5 in PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#4 0x00007fa7d6dfc156 in PyEval_RestoreThread () from /lib64/libpython2.7.so.1.0
#5 0x0000558386792a0d in Gil::Gil (this=0x7fa7b3352ad0, ts=..., new_thread=<optimized out>) at /usr/src/debug/ceph-12.2.9/src/mgr/Gil.cc:37
#6 0x00005583867763b3 in ActivePyModule::notify_clog (this=0x558390b81000, log_entry=...) at /usr/src/debug/ceph-12.2.9/src/mgr/ActivePyModule.cc:108
#7 0x000055838673760a in operator() (a0=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-12.2.9/build/boost/include/boost/function/function_template.hpp:760
#8 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /usr/src/debug/ceph-12.2.9/src/include/Context.h:493
#9 0x0000558386732399 in Context::complete (this=0x558391bbec10, r=<optimized out>) at /usr/src/debug/ceph-12.2.9/src/include/Context.h:70
#10 0x00005583868d35d8 in Finisher::finisher_thread_entry (this=0x558391fe2140) at /usr/src/debug/ceph-12.2.9/src/common/Finisher.cc:72
#11 0x00007fa7d5016e25 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fa7d430f36d in clone () from /lib64/libc.so.6@


Related issues 4 (0 open4 closed)

Has duplicate mgr - Bug #40156: deadlock on MonCommandCompletionDuplicate

Actions
Copied to mgr - Backport #39424: luminous: mgr: deadlockResolvedxie xingguoActions
Copied to mgr - Backport #39425: nautilus: mgr: deadlockResolvedPrashant DActions
Copied to mgr - Backport #39426: mimic: mgr: deadlockResolvedPrashant DActions
Actions #1

Updated by xie xingguo about 5 years ago

thread 4 is currently holding GIL and waiting for mon_command to finish, while thread 8 (mon_client's finisher thread) is blocking at acquiring the GTL, which as result causes mgr deadlock.

Actions #2

Updated by Patrick Donnelly about 5 years ago

  • Project changed from Ceph to mgr
  • Subject changed from luminous: mgr deadlock to mgr: deadlock
  • Start date deleted (03/30/2019)

This has probably been fixed in one of the recent merges. Don't know if those fixes are slated for backport though!

Actions #3

Updated by Patrick Donnelly about 5 years ago

  • Status changed from New to Fix Under Review
  • Target version set to v15.0.0
  • Pull request ID set to 27280
Actions #4

Updated by Kefu Chai about 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #8

Updated by Sage Weil about 5 years ago

  • Status changed from Pending Backport to In Progress

The original fix was buggy! Do not backport yet. See #39335

Actions #9

Updated by Sage Weil about 5 years ago

  • Related to Bug #39335: deadlock on command completion added
Actions #10

Updated by Sage Weil about 5 years ago

  • Related to deleted (Bug #39335: deadlock on command completion)
Actions #11

Updated by Sage Weil about 5 years ago

  • Status changed from In Progress to Pending Backport

Scratch that, this fix is good! I was just testing a version before this was merged and got confuuused

Actions #12

Updated by Nathan Cutler about 5 years ago

Actions #13

Updated by Nathan Cutler about 5 years ago

Actions #14

Updated by Nathan Cutler about 5 years ago

Actions #15

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved
Actions #16

Updated by Nathan Cutler over 4 years ago

  • Has duplicate Bug #40156: deadlock on MonCommandCompletion added
Actions

Also available in: Atom PDF