Project

General

Profile

Actions

Bug #48852

closed

Mgr deadlock occurs in the process of cluster expansion and reduction

Added by jiaqi peng over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

deadlock in mgr at Thread 39 and Thread 30,Thread 39 held lock "Objecter::rwlock" in function "Objecter::handle_osd_map()", and waited on lock "ActivePyModules::lock" in function "ActivePyModules::notify_all()".Thread 30 held lock "ActivePyModules::lock"in function "ActivePyModules::get_osdmap()", and waited on lock "Objecter::rwlock" in function "with_osdmap()".

Thread 39 (Thread 0x7fc8975cd700 (LWP 29877)):
0 0x00007fc89f2c354d in _lll_lock_wait () from /lib64/libpthread.so.0
1 0x00007fc89f2bee9b in _L_lock_883 () from /lib64/libpthread.so.0
2 0x00007fc89f2bed68 in pthread_mutex_lock () from /lib64/libpthread.so.0
3 0x00007fc8a1a8a829 in Mutex::lock (this=this@entry=0x55f5157a5ce0, no_lockdep=no_lockdep@entry=false) at /usr/src/debug/ceph-14.2.10/src/common/Mutex.cc:78
4 0x000055f50ffffb7b in lock_guard (
_m=..., this=<synthetic pointer>) at /usr/src/debug/ceph-14.2.10/src/mgr/ActivePyModules.cc:495
5 ActivePyModules::notify_all(std::string const&, std::string const&) () at /usr/src/debug/ceph-14.2.10/src/mgr/ActivePyModules.cc:495
6 0x000055f510028a64 in MonCommandCompletion::finish (this=0x55f53eab0780, r=<optimized out>) at /usr/src/debug/ceph-14.2.10/src/mgr/BaseMgrModule.cc:110
7 0x000055f51000f13f in complete (r=<optimized out>, this=0x55f53eab0780) at /usr/src/debug/ceph-14.2.10/src/include/Context.h:77
8 operator() (__closure=<optimized out>, __closure=<optimized out>, wait_r=<optimized out>) at /usr/src/debug/ceph-14.2.10/src/mgr/BaseMgrModule.cc:157
9 boost::detail::function::void_function_obj_invoker1<ceph_send_command(BaseMgrModule*, PyObject*)::<lambda(int)>::<lambda(int)>, void, int>::invoke(boost::detail::function::function_buffer &, int) (function_obj_ptr=..., a0=<optimized out>) at /usr/src/debug/ceph-14.2.10/build/boost/include/boost/function/function_template.hpp:158
10 0x000055f510009ffc in operator() (a0=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-14.2.10/build/boost/include/boost/function/function_base.hpp:606
11 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /usr/src/debug/ceph-14.2.10/src/include/Context.h:487
12 0x000055f5100057e9 in Context::complete (this=0x55f53a9fd080, r=<optimized out>) at /usr/src/debug/ceph-14.2.10/src/include/Context.h:77
13 0x000055f5100fa885 in Objecter::handle_osd_map(MOSDMap*) () at /usr/src/debug/ceph-14.2.10/src/osdc/Objecter.cc:1363
14 0x000055f5100fed7b in Objecter::ms_dispatch(Message*) () at /usr/src/debug/ceph-14.2.10/src/osdc/Objecter.cc:1003
15 0x000055f51007b296 in Dispatcher::ms_dispatch2 (this=0x7ffd1864caa8, m=...) at /usr/src/debug/ceph-14.2.10/src/msg/Dispatcher.h:126
16 0x00007fc8a1bf865a in ms_deliver_dispatch (m=..., this=0x55f5123b0900) at /usr/src/debug/ceph-14.2.10/src/common/RefCountedObj.h:64
17 DispatchQueue::entry() () at /usr/src/debug/ceph-14.2.10/src/msg/DispatchQueue.cc:197
18 0x00007fc8a1cae27d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.2.10/src/msg/DispatchQueue.h:102
19 0x00007fc89f2bcea5 in start_thread () from /lib64/libpthread.so.0
20 0x00007fc89e39a8dd in clone () from /lib64/libc.so.6

Thread 30 (Thread 0x7fc8820a3700 (LWP 1040173)):
0 0x00007fc89f2c0184 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
1 0x000055f51000a5a8 in lock_shared (this=<optimized out>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/shared_mutex:139
2 lock_shared (this=<optimized out>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/shared_mutex:335
3 boost::shared_lock<std::shared_mutex>::lock (this=this@entry=0x7fc88209fe20) at /usr/src/debug/ceph-14.2.10/build/boost/include/boost/thread/lock_types.hpp:645
4 0x000055f510004a3c in shared_lock (m_=..., this=0x7fc88209fe20) at /usr/src/debug/ceph-14.2.10/src/mgr/ActivePyModules.cc:916
5 with_osdmap<ActivePyModules::get_osdmap()::<lambda(const OSDMap&)> > (cb=<optimized out>, this=0x7ffd1864caa0) at /usr/src/debug/ceph-14.2.10/src/osdc/Objecter.h:2113
6 with_osdmap<ActivePyModules::get_osdmap()::<lambda(const OSDMap&)> > (this=<optimized out>) at /usr/src/debug/ceph-14.2.10/src/mgr/ClusterState.h:129
7 ActivePyModules::get_osdmap() () at /usr/src/debug/ceph-14.2.10/src/mgr/ActivePyModules.cc:916
8 0x00007fc8a1490acc in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
9 0x00007fc8a149070d in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
10 0x00007fc8a149308d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
11 0x00007fc8a141c9c8 in function_call () from /lib64/libpython2.7.so.1.0
12 0x00007fc8a13f7ab3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
13 0x00007fc8a1406aa5 in instancemethod_call () from /lib64/libpython2.7.so.1.0
14 0x00007fc8a13f7ab3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
15 0x00007fc8a13f7b95 in call_function_tail () from /lib64/libpython2.7.so.1.0
16 0x00007fc8a13f7ecb in PyObject_CallMethod () from /lib64/libpython2.7.so.1.0
17 0x000055f50fff7c6f in ActivePyModule::notify(std::string const&, std::string const&) () at /usr/src/debug/ceph-14.2.10/src/mgr/ActivePyModule.cc:61
18 0x000055f510009ffc in operator() (a0=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-14.2.10/build/boost/include/boost/function/function_base.hpp:606
19 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /usr/src/debug/ceph-14.2.10/src/include/Context.h:487
20 0x000055f5100057e9 in Context::complete (this=0x55f51e10a3f0, r=<optimized out>) at /usr/src/debug/ceph-14.2.10/src/include/Context.h:77
21 0x00007fc8a1a5758f in Finisher::finisher_thread_entry() () at /usr/src/debug/ceph-14.2.10/src/common/Finisher.cc:67
22 0x00007fc89f2bcea5 in start_thread () from /lib64/libpthread.so.0
23 0x00007fc89e39a8dd in clone () from /lib64/libc.so.6


Related issues 2 (0 open2 closed)

Copied to Ceph - Backport #48897: nautilus: Mgr deadlock occurs in the process of cluster expansion and reductionResolvedNathan CutlerActions
Copied to Ceph - Backport #48898: octopus: Mgr deadlock occurs in the process of cluster expansion and reductionResolvedNathan CutlerActions
Actions #1

Updated by Kefu Chai over 3 years ago

  • Status changed from New to Fix Under Review
  • Backport set to nautilus, octopus
  • Pull request ID set to 38762
Actions #2

Updated by Kefu Chai over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #3

Updated by Backport Bot over 3 years ago

  • Copied to Backport #48897: nautilus: Mgr deadlock occurs in the process of cluster expansion and reduction added
Actions #4

Updated by Backport Bot over 3 years ago

  • Copied to Backport #48898: octopus: Mgr deadlock occurs in the process of cluster expansion and reduction added
Actions #5

Updated by Loïc Dachary about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF