Project

General

Profile

Bug #40156

deadlock on MonCommandCompletion

Added by Sage Weil almost 5 years ago. Updated over 4 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Thread 16 (Thread 0x7f70a0049700 (LWP 982531)):
#0  0x00007f70abb5c827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x274f900) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x274f900, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f70abb5c8d4 in __new_sem_wait_slow (sem=0x274f900, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f70abb5c97a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4  0x00007f70ac0affe8 in PyThread_acquire_lock () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#5  0x00007f70ac085586 in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#6  0x00007f70ac1c305c in PyEval_EvalCodeEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#7  0x00007f70ac119370 in ?? () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#8  0x00007f70ac0ec273 in PyObject_Call () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#9  0x00007f70ac1603ac in ?? () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#10 0x00007f70ac0ec273 in PyObject_Call () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#11 0x00007f70ac0ec6df in PyObject_CallFunctionObjArgs () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#12 0x00007f70ac088f5d in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#13 0x00007f70ac08c044 in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
---Type <return> to continue, or q <return> to quit---
#14 0x00007f70ac1c305c in PyEval_EvalCodeEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#15 0x00007f70ac119370 in ?? () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#16 0x00007f70ac0ec273 in PyObject_Call () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#17 0x00007f70ac1603ac in ?? () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#18 0x00007f70ac0ec273 in PyObject_Call () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#19 0x00007f70ac1c2487 in PyEval_CallObjectWithKeywords () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#20 0x0000000000530559 in MonCommandCompletion::finish (this=0x82aaa50, r=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/src/mgr/BaseMgrModule.cc:100
#21 0x000000000051878c in Context::complete (r=<optimized out>, this=0x82aaa50) at /build/ceph-14.2.1-198-g869a6a3/src/include/Context.h:77
#22 <lambda(int)>::<lambda(int)>::operator() (__closure=<optimized out>, __closure=<optimized out>, wait_r=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/src/mgr/BaseMgrModule.cc:157
#23 boost::detail::function::void_function_obj_invoker1<ceph_send_command(BaseMgrModule*, PyObject*)::<lambda(int)>::<lambda(int)>, void, int>::invoke(boost::detail::function::function_buffer &, int) (function_obj_ptr=..., a0=<optimized out>)
    at /build/ceph-14.2.1-198-g869a6a3/obj-x86_64-linux-gnu/boost/include/boost/function/function_template.hpp:159
#24 0x0000000000514639 in boost::function1<void, int>::operator() (a0=<optimized out>, this=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/obj-x86_64-linux-gnu/boost/include/boost/function/function_template.hpp:768
#25 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/src/include/Context.h:487
#26 0x0000000000511419 in Context::complete (this=0x1ae61380, r=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/src/include/Context.h:77
#27 0x00000000005c72ce in Objecter::get_latest_version (this=<optimized out>, oldest=<optimized out>, newest=1104054, fin=0x1ae61380) at /build/ceph-14.2.1-198-g869a6a3/src/osdc/Objecter.cc:1953
#28 0x00000000005f9a1d in C_Objecter_GetVersion::finish (this=<optimized out>, r=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/src/osdc/Objecter.cc:1929
#29 0x0000000000511419 in Context::complete (this=0x1694d510, r=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/src/include/Context.h:77
#30 0x00007f70ac89375e in Finisher::finisher_thread_entry() () from target:/usr/lib/ceph/libceph-common.so.0
#31 0x00007f70abb546ba in start_thread (arg=0x7f70a0049700) at pthread_create.c:333
#32 0x00007f70ab37d41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

possibly other threads of note,

Thread 21 (Thread 0x7f7097327700 (LWP 982544)):
#0  0x00007f70abb5c827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x274f900) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x274f900, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f70abb5c8d4 in __new_sem_wait_slow (sem=0x274f900, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f70abb5c97a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4  0x00007f70ac0affe8 in PyThread_acquire_lock () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#5  0x00007f70ac084926 in PyEval_RestoreThread () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#6  0x000000000057fdd5 in Gil::Gil (this=0x7f7097324700, ts=..., new_thread=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/src/mgr/Gil.cc:37
#7  0x0000000000503698 in ActivePyModule::notify_clog (this=0x4b8c780, log_entry=...) at /build/ceph-14.2.1-198-g869a6a3/src/mgr/ActivePyModule.cc:79
#8  0x0000000000514639 in boost::function1<void, int>::operator() (a0=<optimized out>, this=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/obj-x86_64-linux-gnu/boost/include/boost/function/function_template.hpp:768
#9  FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/src/include/Context.h:487
#10 0x0000000000511419 in Context::complete (this=0x1d0a0cd0, r=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/src/include/Context.h:77
#11 0x00007f70ac89375e in Finisher::finisher_thread_entry() () from target:/usr/lib/ceph/libceph-common.so.0
#12 0x00007f70abb546ba in start_thread (arg=0x7f7097327700) at pthread_create.c:333
#13 0x00007f70ab37d41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 27 (Thread 0x7f7093b20700 (LWP 982551)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000050d798 in Cond::Wait (mutex=..., this=0x7f7093b1c890) at /build/ceph-14.2.1-198-g869a6a3/src/common/Cond.h:49
#2  C_SaferCond::wait (this=0x7f7093b1c828) at /build/ceph-14.2.1-198-g869a6a3/src/common/Cond.h:196
#3  Command::wait (this=0x7f7093b1c820) at /build/ceph-14.2.1-198-g869a6a3/src/mgr/MgrContext.h:39
#4  ActivePyModules::set_store (this=this@entry=0x4c38280, module_name=..., key=..., val=...) at /build/ceph-14.2.1-198-g869a6a3/src/mgr/ActivePyModules.cc:626
#5  0x000000000051963a in ceph_store_set (self=<optimized out>, args=<optimized out>) at /build/ceph-14.2.1-198-g869a6a3/src/mgr/BaseMgrModule.cc:484
#6  0x00007f70ac08d971 in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#7  0x00007f70ac08c044 in PyEval_EvalFrameEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#8  0x00007f70ac1c305c in PyEval_EvalCodeEx () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#9  0x00007f70ac119370 in ?? () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#10 0x00007f70ac0ec273 in PyObject_Call () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#11 0x00007f70ac1603ac in ?? () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#12 0x00007f70ac0ec273 in PyObject_Call () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#13 0x00007f70ac0ed444 in PyObject_CallMethod () from target:/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#14 0x00000000005b1fab in PyModuleRunner::serve (this=0x4b8cf00) at /build/ceph-14.2.1-198-g869a6a3/src/mgr/PyModuleRunner.cc:47
#15 0x00000000005b2605 in PyModuleRunner::PyModuleRunnerThread::entry (this=0x4b8cf48) at /build/ceph-14.2.1-198-g869a6a3/src/mgr/PyModuleRunner.cc:106
#16 0x00007f70abb546ba in start_thread (arg=0x7f7093b20700) at pthread_create.c:333
#17 0x00007f70ab37d41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

this was on the lab cluster.

some mgr command were fine (e.g., 'ceph pg ls'), but the module commands ('ceph crash ls') would hang.


Related issues

Duplicates mgr - Bug #39040: mgr: deadlock Resolved

History

#2 Updated by Kefu Chai over 4 years ago

  • Assignee set to Kefu Chai

#3 Updated by Kefu Chai over 4 years ago

  • Status changed from 12 to Fix Under Review
  • Pull request ID set to 30468

#5 Updated by Kefu Chai over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Assignee deleted (Kefu Chai)
  • Backport set to luminous,nautilus,mimic
  • Pull request ID changed from 30468 to 27280

#9 Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Duplicate

Duplicate of #39040

#10 Updated by Nathan Cutler over 4 years ago

#11 Updated by Nathan Cutler over 4 years ago

  • Backport deleted (luminous,nautilus,mimic)

Already backported via #39040

Also available in: Atom PDF