Project

General

Profile

Bug #37997

Bug #38094: mgr: crash list

mgr: abort, dashboard, "recursive lock of PyModuleRegistry::lock"

Added by Ernesto Puerta about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
ceph-mgr
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Build from master (a969c9d4061713a8614b64e27fde560e1cfb2d08), containerized RHEL7 default vstart cluster. It has happened several times during restart or disabling/enabling ceph-mgr modules (dashboard). Looks like a race condition issue, a deadlock between several threads accessing config options.

Potentially troublesome code seems to point to this recently merged code dealing with module options

I'll be watchful to see if I manage to get it reproducible.

Trace dump (mgr.x.log):

 ceph version Development (no_version) nautilus (dev)
 1: (Mutex::_will_lock()+0x4c) [0x7fa7d5971ef2]
 2: (Mutex::lock(bool)+0x51) [0x7fa7d5971cfb]
 3: (std::lock_guard<Mutex>::lock_guard(Mutex&)+0x2f) [0x55b9a26f552f]
 4: (PyModuleRegistry::module_exists(std::string const&) const+0x33) [0x55b9a2706d75]
 5: (ActivePyModules::get_typed_config(std::string const&, std::string const&) const+0x50) [0x55b9a26fc3b2]
 6: (()+0x68bb59) [0x55b9a271cb59]
 7: (()+0x68be3c) [0x55b9a271ce3c]
 8: (PyEval_EvalFrameEx()+0x6df0) [0x7fa7d4582cf0]
 9: (PyEval_EvalFrameEx()+0x67bd) [0x7fa7d45826bd]
 10: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 11: (PyEval_EvalFrameEx()+0x663c) [0x7fa7d458253c]
 12: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 13: (()+0x70978) [0x7fa7d450e978]
 14: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 15: (PyEval_EvalFrameEx()+0x17fd) [0x7fa7d457d6fd]
 16: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 17: (PyEval_EvalFrameEx()+0x663c) [0x7fa7d458253c]
 18: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 19: (PyEval_EvalFrameEx()+0x663c) [0x7fa7d458253c]
 20: (PyEval_EvalFrameEx()+0x67bd) [0x7fa7d45826bd]
 21: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 22: (()+0x70978) [0x7fa7d450e978]
 23: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 24: (()+0x5aa55) [0x7fa7d44f8a55]
 25: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 26: (()+0xa2e27) [0x7fa7d4540e27]
 27: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 28: (PyEval_EvalFrameEx()+0x2336) [0x7fa7d457e236]
 29: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 30: (()+0x70978) [0x7fa7d450e978]
 31: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 32: (()+0x5aa55) [0x7fa7d44f8a55]
 33: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 34: (()+0xa2a87) [0x7fa7d4540a87]
 35: (()+0xa179f) [0x7fa7d453f79f]
 36: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 37: (PyEval_CallObjectWithKeywords()+0x47) [0x7fa7d457b8f7]
 38: (ActivePyModule::load(ActivePyModules*)+0x127) [0x55b9a26f2c13]
 39: (ActivePyModules::start_one(std::shared_ptr<PyModule>)+0x155) [0x55b9a26faaff]
 40: (PyModuleRegistry::active_start(DaemonStateIndex&, ClusterState&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, MonClient&, std::shared_ptr<LogChannel>, std::shared_ptr<LogChannel>, Objecter&, Client&, Finisher&, DaemonServer&)+0x4ee) [0x55b9a282deb0]
 41: (Mgr::init()+0xcc4) [0x55b9a27dd8c6]
 42: (()+0x74a9d6) [0x55b9a27db9d6]
 43: (()+0x751b54) [0x55b9a27e2b54]
 44: (boost::function1<void, int>::operator()(int) const+0x6c) [0x55b9a270789c]
 45: (FunctionContext::finish(int)+0x24) [0x55b9a2704624]
 46: (Context::complete(int)+0x27) [0x55b9a27044eb]
 47: (Finisher::finisher_thread_entry()+0x38b) [0x7fa7d5918cf9]
 48: (Finisher::FinisherThread::entry()+0x1c) [0x55b9a27e39a4]
 49: (Thread::entry_wrapper()+0x78) [0x7fa7d59859dc]
 50: (Thread::_entry_func(void*)+0x18) [0x7fa7d598595a]
 51: (()+0x7dd5) [0x7fa7d216ddd5]
 52: (clone()+0x6d) [0x7fa7d0e1dead]

    -1> 2019-01-21 20:01:02.051 7fa7b697c700 -1 /ceph/src/common/lockdep.cc: In function 'int lockdep_will_lock(const char*, int, bool, bool)' thread 7fa7b697c700 time 2019-01-21 20:01:02.041972
/ceph/src/common/lockdep.cc: 305: abort()

 ceph version Development (no_version) nautilus (dev)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xfe) [0x7fa7d59e54c8]
 2: (lockdep_will_lock(char const*, int, bool, bool)+0x423) [0x7fa7d5af9d61]
 3: (Mutex::_will_lock()+0x4c) [0x7fa7d5971ef2]
 4: (Mutex::lock(bool)+0x51) [0x7fa7d5971cfb]
 5: (std::lock_guard<Mutex>::lock_guard(Mutex&)+0x2f) [0x55b9a26f552f]
 6: (PyModuleRegistry::module_exists(std::string const&) const+0x33) [0x55b9a2706d75]
 7: (ActivePyModules::get_typed_config(std::string const&, std::string const&) const+0x50) [0x55b9a26fc3b2]
 8: (()+0x68bb59) [0x55b9a271cb59]
 9: (()+0x68be3c) [0x55b9a271ce3c]
 10: (PyEval_EvalFrameEx()+0x6df0) [0x7fa7d4582cf0]
 11: (PyEval_EvalFrameEx()+0x67bd) [0x7fa7d45826bd]
 12: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 13: (PyEval_EvalFrameEx()+0x663c) [0x7fa7d458253c]
 14: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 15: (()+0x70978) [0x7fa7d450e978]
 16: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 17: (PyEval_EvalFrameEx()+0x17fd) [0x7fa7d457d6fd]
 18: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 19: (PyEval_EvalFrameEx()+0x663c) [0x7fa7d458253c]
 20: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 21: (PyEval_EvalFrameEx()+0x663c) [0x7fa7d458253c]
 22: (PyEval_EvalFrameEx()+0x67bd) [0x7fa7d45826bd]
 23: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 24: (()+0x70978) [0x7fa7d450e978]
 25: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 26: (()+0x5aa55) [0x7fa7d44f8a55]
 27: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 28: (()+0xa2e27) [0x7fa7d4540e27]
 29: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 30: (PyEval_EvalFrameEx()+0x2336) [0x7fa7d457e236]
 31: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 32: (()+0x70978) [0x7fa7d450e978]
 33: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 34: (()+0x5aa55) [0x7fa7d44f8a55]
 35: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 36: (()+0xa2a87) [0x7fa7d4540a87]
 37: (()+0xa179f) [0x7fa7d453f79f]
 38: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 39: (PyEval_CallObjectWithKeywords()+0x47) [0x7fa7d457b8f7]
 40: (ActivePyModule::load(ActivePyModules*)+0x127) [0x55b9a26f2c13]
 41: (ActivePyModules::start_one(std::shared_ptr<PyModule>)+0x155) [0x55b9a26faaff]
 42: (PyModuleRegistry::active_start(DaemonStateIndex&, ClusterState&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, MonClient&, std::shared_ptr<LogChannel>, std::shared_ptr<LogChannel>, Objecter&, Client&, Finisher&, DaemonServer&)+0x4ee) [0x55b9a282deb0]
 43: (Mgr::init()+0xcc4) [0x55b9a27dd8c6]
 44: (()+0x74a9d6) [0x55b9a27db9d6]
 45: (()+0x751b54) [0x55b9a27e2b54]
 46: (boost::function1<void, int>::operator()(int) const+0x6c) [0x55b9a270789c]
 47: (FunctionContext::finish(int)+0x24) [0x55b9a2704624]
 48: (Context::complete(int)+0x27) [0x55b9a27044eb]
 49: (Finisher::finisher_thread_entry()+0x38b) [0x7fa7d5918cf9]
 50: (Finisher::FinisherThread::entry()+0x1c) [0x55b9a27e39a4]
 51: (Thread::entry_wrapper()+0x78) [0x7fa7d59859dc]
 52: (Thread::_entry_func(void*)+0x18) [0x7fa7d598595a]
 53: (()+0x7dd5) [0x7fa7d216ddd5]
 54: (clone()+0x6d) [0x7fa7d0e1dead]

     0> 2019-01-21 20:01:02.063 7fa7b697c700 -1 *** Caught signal (Aborted) **
 in thread 7fa7b697c700 thread_name:mgr-fin

 ceph version Development (no_version) nautilus (dev)
 1: (()+0x93c7a0) [0x55b9a29cd7a0]
 2: (()+0xf5d0) [0x7fa7d21755d0]
 3: (gsignal()+0x37) [0x7fa7d0d56207]
 4: (abort()+0x148) [0x7fa7d0d578f8]
 5: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0x377) [0x7fa7d59e5741]
 6: (lockdep_will_lock(char const*, int, bool, bool)+0x423) [0x7fa7d5af9d61]
 7: (Mutex::_will_lock()+0x4c) [0x7fa7d5971ef2]
 8: (Mutex::lock(bool)+0x51) [0x7fa7d5971cfb]
 9: (std::lock_guard<Mutex>::lock_guard(Mutex&)+0x2f) [0x55b9a26f552f]
 10: (PyModuleRegistry::module_exists(std::string const&) const+0x33) [0x55b9a2706d75]
 11: (ActivePyModules::get_typed_config(std::string const&, std::string const&) const+0x50) [0x55b9a26fc3b2]
 12: (()+0x68bb59) [0x55b9a271cb59]
 13: (()+0x68be3c) [0x55b9a271ce3c]
 14: (PyEval_EvalFrameEx()+0x6df0) [0x7fa7d4582cf0]
 15: (PyEval_EvalFrameEx()+0x67bd) [0x7fa7d45826bd]
 16: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 17: (PyEval_EvalFrameEx()+0x663c) [0x7fa7d458253c]
 18: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 19: (()+0x70978) [0x7fa7d450e978]
 20: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 21: (PyEval_EvalFrameEx()+0x17fd) [0x7fa7d457d6fd]
 22: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 23: (PyEval_EvalFrameEx()+0x663c) [0x7fa7d458253c]
 24: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 25: (PyEval_EvalFrameEx()+0x663c) [0x7fa7d458253c]
 26: (PyEval_EvalFrameEx()+0x67bd) [0x7fa7d45826bd]
 27: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 28: (()+0x70978) [0x7fa7d450e978]
 29: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 30: (()+0x5aa55) [0x7fa7d44f8a55]
 31: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 32: (()+0xa2e27) [0x7fa7d4540e27]
 33: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 34: (PyEval_EvalFrameEx()+0x2336) [0x7fa7d457e236]
 35: (PyEval_EvalCodeEx()+0x7ed) [0x7fa7d458503d]
 36: (()+0x70978) [0x7fa7d450e978]
 37: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 38: (()+0x5aa55) [0x7fa7d44f8a55]
 39: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 40: (()+0xa2a87) [0x7fa7d4540a87]
 41: (()+0xa179f) [0x7fa7d453f79f]
 42: (PyObject_Call()+0x43) [0x7fa7d44e9a63]
 43: (PyEval_CallObjectWithKeywords()+0x47) [0x7fa7d457b8f7]
 44: (ActivePyModule::load(ActivePyModules*)+0x127) [0x55b9a26f2c13]
 45: (ActivePyModules::start_one(std::shared_ptr<PyModule>)+0x155) [0x55b9a26faaff]
 46: (PyModuleRegistry::active_start(DaemonStateIndex&, ClusterState&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, MonClient&, std::shared_ptr<LogChannel>, std::shared_ptr<LogChannel>, Objecter&, Client&, Finisher&, DaemonServer&)+0x4ee) [0x55b9a282deb0]
 47: (Mgr::init()+0xcc4) [0x55b9a27dd8c6]
 48: (()+0x74a9d6) [0x55b9a27db9d6]
 49: (()+0x751b54) [0x55b9a27e2b54]
 50: (boost::function1<void, int>::operator()(int) const+0x6c) [0x55b9a270789c]
 51: (FunctionContext::finish(int)+0x24) [0x55b9a2704624]
 52: (Context::complete(int)+0x27) [0x55b9a27044eb]
 53: (Finisher::finisher_thread_entry()+0x38b) [0x7fa7d5918cf9]
 54: (Finisher::FinisherThread::entry()+0x1c) [0x55b9a27e39a4]
 55: (Thread::entry_wrapper()+0x78) [0x7fa7d59859dc]
 56: (Thread::_entry_func(void*)+0x18) [0x7fa7d598595a]
 57: (()+0x7dd5) [0x7fa7d216ddd5]
 58: (clone()+0x6d) [0x7fa7d0e1dead]

History

#1 Updated by Ricardo Marques about 5 years ago

  • Priority changed from High to Urgent

#2 Updated by Ernesto Puerta about 5 years ago

  • Affected Versions v15.0.0 added
  • Affected Versions deleted (v14.0.0)

#3 Updated by Ernesto Puerta about 5 years ago

  • Description updated (diff)

#4 Updated by Volker Theile about 5 years ago

Found out the problem. There is indeed a dead-lock situation. PyModuleRegistry::active_start() get's the lock first and finally PyModuleRegistry::module_exists() want's to get it.

If you have a look on PyModuleRegistry::module_exists() the lock makes sense, because the list of modules shouldn't be modified during the check. Keeping this in mind it looks to me that PyModuleRegistry::active_start() is holding the lock too long or it is needless in this case because the member variable PyModuleRegistry::modules is not used there.

This is all based on the assumption that the PyModuleRegistry::lock member variable is used to protect the PyModuleRegistry::modules member. Maybe i'm wrong here. If you look into the PyModuleRegistry header file there are many functions that work with the modules variable and getting the lock beforehand, so i assume i'm not totally wrong.

#5 Updated by Patrick Seidensal about 5 years ago

-138> 2019-01-22 13:33:43.866 7f5169b2b700  4 mgr.server handle_report from 0x558c466cbb00 client,x
-137> 2019-01-22 13:33:43.866 7f5169b2b700  4 mgr.server handle_report rejecting report from non-daemon client x
-136> 2019-01-22 13:33:43.866 7f5169b2b700  1 -- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] >> v2:192.168.178.3:33684/25726521 conn(0x558c466cbb00 msgr2=0x558c46953200 :33110 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
-135> 2019-01-22 13:33:43.866 7f517b167700  1 -- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 msgr2=0x558c463ae600 :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 50
-134> 2019-01-22 13:33:43.866 7f517b167700  1 -- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 msgr2=0x558c463ae600 :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
-133> 2019-01-22 13:33:43.866 7f517b167700  1 --2- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 0x558c463ae600 :-1 s=OPENED pgs=33 cs=1 l=1).handle_message read tag failed
-132> 2019-01-22 13:33:43.866 7f517b167700  1 --2- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 0x558c463ae600 :-1 s=OPENED pgs=33 cs=1 l=1).fault on lossy channel, failing
-131> 2019-01-22 13:33:43.870 7f5159539700  4 mgrc ms_handle_reset ms_handle_reset con 0x558c46b43f80
-130> 2019-01-22 13:33:43.870 7f5159539700  4 mgrc reconnect Terminating session with v2:192.168.178.3:6800/9882
-129> 2019-01-22 13:33:43.870 7f5159539700  1 -- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 msgr2=0x558c463ae600 :-1 s=STATE_CLOSED l=1).mark_down
-128> 2019-01-22 13:33:43.870 7f5159539700  4 mgrc reconnect waiting to retry connect until 2019-01-22 13:33:44.867758
-127> 2019-01-22 13:33:44.794 7f517515b700 10 mgr tick tick
-126> 2019-01-22 13:33:44.794 7f517515b700  4 mgr send_beacon active
-125> 2019-01-22 13:33:44.794 7f517515b700 10 mgr send_beacon sending beacon as gid 4576
-124> 2019-01-22 13:33:44.814 7f517515b700 10 monclient: _send_mon_message to mon.a at v2:192.168.178.3:40795/0
-123> 2019-01-22 13:33:44.814 7f517515b700  1 -- v2:192.168.178.3:43678/9882 --> [v2:192.168.178.3:40795/0,v1:192.168.178.3:40796/0] -- mgrbeacon mgr.x(afff2d12-1e1a-455e-b267-c711214ecc0e,4576, [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882], 1) v8 -- 0x558c4695fc00 con 0x558c42760d80
-122> 2019-01-22 13:33:44.866 7f5157535700  4 mgrc reconnect Starting new session with [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882]
-121> 2019-01-22 13:33:44.866 7f5157535700  1 -- v2:192.168.178.3:33684/25726521 --> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] -- mgropen(unknown.x) v3 -- 0x558c46ba6080 con 0x558c46b43f80
-120> 2019-01-22 13:33:44.866 7f517d16b700  1 --2- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] >>  conn(0x558c46d7fb00 0x558c46952600 :33112 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=51 v2:192.168.178.3:6800/9882 myaddrs [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] target_addr v2:192.168.178.3:33112/0 addrs are 145
-119> 2019-01-22 13:33:44.866 7f517d16b700 10 mgr.server ms_handle_authentication ms_handle_authentication new session 0x558c45be5680 con 0x558c46d7fb00 entity mgr.x addr v2:192.168.178.3:33684/25726521
-118> 2019-01-22 13:33:44.866 7f5169b2b700  1 -- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] <== client.4582 v2:192.168.178.3:33684/25726521 1 ==== mgropen(client.x) v3 ==== 57363+0+0 (1831594706 0 0) 0x558c46ba49c0 con 0x558c46d7fb00
-117> 2019-01-22 13:33:44.866 7f5169b2b700  4 mgr.server handle_open from 0x558c46d7fb00  client,x
-116> 2019-01-22 13:33:44.866 7f5169b2b700  1 -- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] --> v2:192.168.178.3:33684/25726521 -- mgrconfigure(period=5, threshold=5) v3 -- 0x558c46b9de00 con 0x558c46d7fb00
-115> 2019-01-22 13:33:44.866 7f5159539700  1 -- v2:192.168.178.3:33684/25726521 <== mgr.4576 v2:192.168.178.3:6800/9882 1 ==== mgrconfigure(period=5, threshold=5) v3 ==== 12+0+0 (2627173306 0 0) 0x558c46b9dc00 con 0x558c46b43f80
-114> 2019-01-22 13:33:44.866 7f5159539700  4 mgrc handle_mgr_configure stats_period=5
-113> 2019-01-22 13:33:44.866 7f5159539700  1 -- v2:192.168.178.3:33684/25726521 --> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] -- mgrreport(unknown.x +8-0 packed 70) v7 -- 0x558c46a8db80 con 0x558c46b43f80
-112> 2019-01-22 13:33:44.866 7f5169b2b700  1 -- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] <== client.4582 v2:192.168.178.3:33684/25726521 2 ==== mgrreport(client.x +8-0 packed 70) v7 ==== 626+0+0 (797699631 0 0) 0x558c4649b100 con 0x558c46d7fb00
-111> 2019-01-22 13:33:44.866 7f5169b2b700  4 mgr.server handle_report from 0x558c46d7fb00 client,x
-110> 2019-01-22 13:33:44.866 7f5169b2b700  4 mgr.server handle_report rejecting report from non-daemon client x
-109> 2019-01-22 13:33:44.866 7f5169b2b700  1 -- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] >> v2:192.168.178.3:33684/25726521 conn(0x558c46d7fb00 msgr2=0x558c46952600 :33112 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
-108> 2019-01-22 13:33:44.866 7f517b167700  1 -- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 msgr2=0x558c463ae600 :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 50
-107> 2019-01-22 13:33:44.866 7f517b167700  1 -- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 msgr2=0x558c463ae600 :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
-106> 2019-01-22 13:33:44.866 7f517b167700  1 --2- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 0x558c463ae600 :-1 s=OPENED pgs=34 cs=1 l=1).handle_message read tag failed
-105> 2019-01-22 13:33:44.866 7f517b167700  1 --2- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 0x558c463ae600 :-1 s=OPENED pgs=34 cs=1 l=1).fault on lossy channel, failing
-104> 2019-01-22 13:33:44.866 7f5159539700  4 mgrc ms_handle_reset ms_handle_reset con 0x558c46b43f80
-103> 2019-01-22 13:33:44.866 7f5159539700  4 mgrc reconnect Terminating session with v2:192.168.178.3:6800/9882
-102> 2019-01-22 13:33:44.866 7f5159539700  1 -- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 msgr2=0x558c463ae600 :-1 s=STATE_CLOSED l=1).mark_down
-101> 2019-01-22 13:33:44.870 7f5159539700  4 mgrc reconnect waiting to retry connect until 2019-01-22 13:33:45.867895
-100> 2019-01-22 13:33:45.086 7f5168b29700 10 mgr.server tick
 -99> 2019-01-22 13:33:45.086 7f5168b29700 10 mgr update_delta_stats  v125
 -98> 2019-01-22 13:33:45.086 7f5168b29700 10 mgr.server operator() 176 pgs: 176 active+clean; 8.1 KiB data, 210 MiB used, 27 GiB / 30 GiB avail
 -97> 2019-01-22 13:33:45.086 7f5168b29700 10 mgr.server operator() 0 health checks
 -96> 2019-01-22 13:33:45.086 7f5168b29700 20 mgr.server operator() health checks:
{}

 -95> 2019-01-22 13:33:45.086 7f5168b29700 20 mgr.server send_report  + osd,0 SLOW_OPS(0|(0,0))
 -94> 2019-01-22 13:33:45.086 7f5168b29700 20 mgr.server send_report  + osd,0 PENDING_CREATING_PGS(0|(0,0))
 -93> 2019-01-22 13:33:45.086 7f5168b29700 10 monclient: _send_mon_message to mon.a at v2:192.168.178.3:40795/0
 -92> 2019-01-22 13:33:45.086 7f5168b29700  1 -- v2:192.168.178.3:43678/9882 --> [v2:192.168.178.3:40795/0,v1:192.168.178.3:40796/0] -- monmgrreport(0 checks) v1 -- 0x558c4703ed00 con 0x558c42760d80
 -91> 2019-01-22 13:33:45.086 7f5168b29700 20 mgr.server adjust_pgs
 -90> 2019-01-22 13:33:45.086 7f5168b29700 10 mgr.server operator() creating_or_unknown 0 max_creating 1024 left 1024
 -89> 2019-01-22 13:33:45.086 7f5168b29700 20 mgr.server operator() misplaced_ratio 0 degraded_ratio 0 inactive_pgs_ratio 0 unknown_pgs_ratio 0; target_max_misplaced_ratio 0.05
 -88> 2019-01-22 13:33:45.118 7f515f044700 20 mgr[dashboard] [::ffff:127.0.0.1:34440] [GET] [admin] /api/summary
 -87> 2019-01-22 13:33:45.118 7f515f044700  4 mgr get_config get_config key: mgr/dashboard/AUDIT_API_ENABLED
 -86> 2019-01-22 13:33:45.118 7f515f044700  4 mgr get_typed_config get_typed_config AUDIT_API_ENABLED not found
 -85> 2019-01-22 13:33:45.118 7f515f044700 20 mgr[dashboard] AMT: token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJjZXBoLWRhc2hib2FyZCIsImp0aSI6ImExMGY4MGNlLWZjNjUtNDEwNC1iMGE2LWZlMWVjMTZlMjVkOSIsImV4cCI6MTU0ODE5MjUzOCwiaWF0IjoxNTQ4MTYzNzM4LCJ1c2VybmFtZSI6ImFkbWluIn0.CcYqMWRuBD_fo7Z9KevUQPedEm4-IsOqHdwTqVcu_eg
 -84> 2019-01-22 13:33:45.122 7f515f044700  4 mgr get_store get_store key: mgr/dashboard/jwt_token_black_list
 -83> 2019-01-22 13:33:45.122 7f515f044700  4 ceph_store_get jwt_token_black_list not found
 -82> 2019-01-22 13:33:45.122 7f515f044700 20 mgr[dashboard] AMT: checking authorization...
 -81> 2019-01-22 13:33:45.122 7f515a7bb700 20 mgr[dashboard] VC: starting execution of <function get_daemons_and_pools at 0x7f516133fd08>
 -80> 2019-01-22 13:33:45.122 7f515a7bb700 10 mgr list_servers_python  >
 -79> 2019-01-22 13:33:45.126 7f515a7bb700 20 mgr[dashboard] VC: execution of <function get_daemons_and_pools at 0x7f516133fd08> finished in: 0.0015964508056640625
 -78> 2019-01-22 13:33:45.134 7f515f044700  4 mgr[dashboard] [::ffff:127.0.0.1:34440] [GET] [200] [0.030s] [admin] [372.0B] /api/summary
 -77> 2019-01-22 13:33:45.202 7f5178962700  1 -- v2:192.168.178.3:43678/9882 <== mon.0 v2:192.168.178.3:40795/0 217 ==== service_map(e2 1 svc) v1 ==== 951+0+0 (1767712443 0 0) 0x558c465d9800 con 0x558c42760d80
 -76> 2019-01-22 13:33:45.202 7f5178962700  4 mgr ms_dispatch active service_map(e2 1 svc) v1
 -75> 2019-01-22 13:33:45.202 7f5178962700  4 mgr ms_dispatch service_map(e2 1 svc) v1
 -74> 2019-01-22 13:33:45.202 7f5178962700 10 mgr handle_service_map e2
 -73> 2019-01-22 13:33:45.202 7f5178962700 10 mgr.server operator() got updated map e2
 -72> 2019-01-22 13:33:45.202 7f5178962700 10 mgr notify_all notify_all: notify_all service_map
 -71> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr Gil Switched to new thread state 0x558c46030900
 -70> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr ~Gil Destroying new thread state 0x558c46030900
 -69> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr Gil Switched to new thread state 0x558c46030900
 -68> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr ~Gil Destroying new thread state 0x558c46030900
 -67> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr Gil Switched to new thread state 0x558c46030900
 -66> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr ~Gil Destroying new thread state 0x558c46030900
 -65> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr Gil Switched to new thread state 0x558c46030900
 -64> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr ~Gil Destroying new thread state 0x558c46030900
 -63> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr Gil Switched to new thread state 0x558c46030900
 -62> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr ~Gil Destroying new thread state 0x558c46030900
 -61> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr Gil Switched to new thread state 0x558c46030900
 -60> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr ~Gil Destroying new thread state 0x558c46030900
 -59> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr Gil Switched to new thread state 0x558c46030900
 -58> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr[restful] Unhandled notification type 'service_map'
 -57> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr ~Gil Destroying new thread state 0x558c46030900
 -56> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr Gil Switched to new thread state 0x558c46030900
 -55> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr ~Gil Destroying new thread state 0x558c46030900
 -54> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr Gil Switched to new thread state 0x558c46030900
 -53> 2019-01-22 13:33:45.202 7f516a32c700 20 mgr ~Gil Destroying new thread state 0x558c46030900
 -52> 2019-01-22 13:33:45.202 7f515b03c700 20 mgr[dashboard] NQ: processing queue: 1
 -51> 2019-01-22 13:33:45.426 7f515e042700 20 mgr[dashboard] [::ffff:127.0.0.1:34444] [GET] [admin] /api/pool
 -50> 2019-01-22 13:33:45.426 7f515e042700  4 mgr get_config get_config key: mgr/dashboard/AUDIT_API_ENABLED
 -49> 2019-01-22 13:33:45.426 7f515e042700  4 mgr get_typed_config get_typed_config AUDIT_API_ENABLED not found
 -48> 2019-01-22 13:33:45.426 7f515e042700 20 mgr[dashboard] AMT: token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJjZXBoLWRhc2hib2FyZCIsImp0aSI6ImExMGY4MGNlLWZjNjUtNDEwNC1iMGE2LWZlMWVjMTZlMjVkOSIsImV4cCI6MTU0ODE5MjUzOCwiaWF0IjoxNTQ4MTYzNzM4LCJ1c2VybmFtZSI6ImFkbWluIn0.CcYqMWRuBD_fo7Z9KevUQPedEm4-IsOqHdwTqVcu_eg
 -47> 2019-01-22 13:33:45.430 7f515e042700  4 mgr get_store get_store key: mgr/dashboard/jwt_token_black_list
 -46> 2019-01-22 13:33:45.430 7f515e042700  4 ceph_store_get jwt_token_black_list not found
 -45> 2019-01-22 13:33:45.430 7f515e042700 20 mgr[dashboard] AMT: checking authorization...
 -44> 2019-01-22 13:33:45.434 7f515e042700 20 mgr[dashboard] AMT: checking '['read']' access to 'pool' scope
 -43> 2019-01-22 13:33:45.454 7f515e042700  4 mgr[dashboard] [::ffff:127.0.0.1:34444] [GET] [200] [0.031s] [admin] [2.9K] /api/pool
 -42> 2019-01-22 13:33:45.866 7f5157535700  4 mgrc reconnect Starting new session with [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882]
 -41> 2019-01-22 13:33:45.866 7f5157535700  1 -- v2:192.168.178.3:33684/25726521 --> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] -- mgropen(unknown.x) v3 -- 0x558c463556c0 con 0x558c46b43f80
 -40> 2019-01-22 13:33:45.866 7f517b167700  1 --2- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] >>  conn(0x558c466cbb00 0x558c46953200 :33122 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=51 v2:192.168.178.3:6800/9882 myaddrs [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] target_addr v2:192.168.178.3:33122/0 addrs are 145
 -39> 2019-01-22 13:33:45.866 7f517b167700 10 mgr.server ms_handle_authentication ms_handle_authentication new session 0x558c45be6f40 con 0x558c466cbb00 entity mgr.x addr v2:192.168.178.3:33684/25726521
 -38> 2019-01-22 13:33:45.866 7f5169b2b700  1 -- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] <== client.4582 v2:192.168.178.3:33684/25726521 1 ==== mgropen(client.x) v3 ==== 57363+0+0 (1831594706 0 0) 0x558c46ba7740 con 0x558c466cbb00
 -37> 2019-01-22 13:33:45.866 7f5169b2b700  4 mgr.server handle_open from 0x558c466cbb00  client,x
 -36> 2019-01-22 13:33:45.866 7f5169b2b700  1 -- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] --> v2:192.168.178.3:33684/25726521 -- mgrconfigure(period=5, threshold=5) v3 -- 0x558c46b9d800 con 0x558c466cbb00
 -35> 2019-01-22 13:33:45.866 7f5159539700  1 -- v2:192.168.178.3:33684/25726521 <== mgr.4576 v2:192.168.178.3:6800/9882 1 ==== mgrconfigure(period=5, threshold=5) v3 ==== 12+0+0 (2627173306 0 0) 0x558c427d4800 con 0x558c46b43f80
 -34> 2019-01-22 13:33:45.866 7f5159539700  4 mgrc handle_mgr_configure stats_period=5
 -33> 2019-01-22 13:33:45.866 7f5159539700  1 -- v2:192.168.178.3:33684/25726521 --> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] -- mgrreport(unknown.x +8-0 packed 70) v7 -- 0x558c46a8c300 con 0x558c46b43f80
 -32> 2019-01-22 13:33:45.870 7f5169b2b700  1 -- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] <== client.4582 v2:192.168.178.3:33684/25726521 2 ==== mgrreport(client.x +8-0 packed 70) v7 ==== 626+0+0 (797699631 0 0) 0x558c46a8db80 con 0x558c466cbb00
 -31> 2019-01-22 13:33:45.870 7f5169b2b700  4 mgr.server handle_report from 0x558c466cbb00 client,x
 -30> 2019-01-22 13:33:45.870 7f5169b2b700  4 mgr.server handle_report rejecting report from non-daemon client x
 -29> 2019-01-22 13:33:45.870 7f5169b2b700  1 -- [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] >> v2:192.168.178.3:33684/25726521 conn(0x558c466cbb00 msgr2=0x558c46953200 :33122 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
 -28> 2019-01-22 13:33:45.870 7f517d16b700  1 -- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 msgr2=0x558c463ae600 :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 50
 -27> 2019-01-22 13:33:45.870 7f517d16b700  1 -- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 msgr2=0x558c463ae600 :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
 -26> 2019-01-22 13:33:45.870 7f517d16b700  1 --2- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 0x558c463ae600 :-1 s=OPENED pgs=35 cs=1 l=1).handle_message read tag failed
 -25> 2019-01-22 13:33:45.870 7f517d16b700  1 --2- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 0x558c463ae600 :-1 s=OPENED pgs=35 cs=1 l=1).fault on lossy channel, failing
 -24> 2019-01-22 13:33:45.870 7f5159539700  4 mgrc ms_handle_reset ms_handle_reset con 0x558c46b43f80
 -23> 2019-01-22 13:33:45.870 7f5159539700  4 mgrc reconnect Terminating session with v2:192.168.178.3:6800/9882
 -22> 2019-01-22 13:33:45.870 7f5159539700  1 -- v2:192.168.178.3:33684/25726521 >> [v2:192.168.178.3:6800/9882,v1:192.168.178.3:6801/9882] conn(0x558c46b43f80 msgr2=0x558c463ae600 :-1 s=STATE_CLOSED l=1).mark_down
 -21> 2019-01-22 13:33:45.870 7f5159539700  4 mgrc reconnect waiting to retry connect until 2019-01-22 13:33:46.868040
 -20> 2019-01-22 13:33:46.382 7f5168328700  4 mgr get_config get_config key: mgr/balancer/active
 -19> 2019-01-22 13:33:46.382 7f5168328700  4 mgr get_typed_config get_typed_config active not found
 -18> 2019-01-22 13:33:46.382 7f5168328700  4 mgr get_config get_config key: mgr/balancer/begin_time
 -17> 2019-01-22 13:33:46.382 7f5168328700  4 mgr get_typed_config get_typed_config begin_time not found
 -16> 2019-01-22 13:33:46.382 7f5168328700  4 mgr get_config get_config key: mgr/balancer/end_time
 -15> 2019-01-22 13:33:46.382 7f5168328700  4 mgr get_typed_config get_typed_config end_time not found
 -14> 2019-01-22 13:33:46.382 7f5168328700 20 mgr[balancer] Waking up [inactive, scheduled for 0000-2400, now 1333]
 -13> 2019-01-22 13:33:46.382 7f5168328700  4 mgr get_config get_config key: mgr/balancer/sleep_interval
 -12> 2019-01-22 13:33:46.382 7f5168328700  4 mgr get_typed_config get_typed_config sleep_interval not found
 -11> 2019-01-22 13:33:46.382 7f5168328700 20 mgr[balancer] Sleeping for 60
 -10> 2019-01-22 13:33:46.590 7f515d841700 20 mgr[dashboard] [::ffff:127.0.0.1:34450] [GET] [admin] /api/pool/test/configuration
  -9> 2019-01-22 13:33:46.590 7f515d841700  4 mgr get_config get_config key: mgr/dashboard/AUDIT_API_ENABLED
  -8> 2019-01-22 13:33:46.590 7f515d841700  4 mgr get_typed_config get_typed_config AUDIT_API_ENABLED not found
  -7> 2019-01-22 13:33:46.590 7f515d841700 20 mgr[dashboard] AMT: token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJjZXBoLWRhc2hib2FyZCIsImp0aSI6ImExMGY4MGNlLWZjNjUtNDEwNC1iMGE2LWZlMWVjMTZlMjVkOSIsImV4cCI6MTU0ODE5MjUzOCwiaWF0IjoxNTQ4MTYzNzM4LCJ1c2VybmFtZSI6ImFkbWluIn0.CcYqMWRuBD_fo7Z9KevUQPedEm4-IsOqHdwTqVcu_eg
  -6> 2019-01-22 13:33:46.598 7f515d841700  4 mgr get_store get_store key: mgr/dashboard/jwt_token_black_list
  -5> 2019-01-22 13:33:46.598 7f515d841700  4 ceph_store_get jwt_token_black_list not found
  -4> 2019-01-22 13:33:46.598 7f515d841700 20 mgr[dashboard] AMT: checking authorization...
  -3> 2019-01-22 13:33:46.598 7f515d841700 20 mgr[dashboard] AMT: checking '['read']' access to 'pool' scope
  -2> 2019-01-22 13:33:46.598 7f515d841700  1 lockdep reusing last freed id 61
  -1> 2019-01-22 13:33:46.606 7f515d841700  1 lockdep using id 62
   0> 2019-01-22 13:33:46.618 7f515d841700 -1 *** Caught signal (Segmentation fault) **
in thread 7f515d841700 thread_name:dashboard

ceph version 14.0.1-2695-g4caeb717dc (4caeb717dc3da7623c944fdaa52f0b6901b2c545) nautilus (dev)
1: (()+0x9f3714) [0x558c3f7d9714]
2: (()+0x11f70) [0x7f517f839f70]
3: (std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)+0xf7) [0x7f517f5187f7]
4: (std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_insert_node(std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node<std::pair<int const, int> >*)+0x9e) [0x7f5181c2ddf8]
5: (std::_Rb_tree_iterator<std::pair<int const, int> > std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<int const&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<int const, int> >, std::piecewise_construct_t const&, std::tuple<int const&>&&, std::tuple<>&&)+0xbc) [0x7f5181c2c3c4]
6: (std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >::operator[](int const&)+0xee) [0x7f5181c2b4f2]
7: (()+0x1434658) [0x7f5181c28658]
8: (lockdep_register(char const*)+0x24) [0x7f5181c28799]
9: (Mutex::_register()+0x20) [0x7f5181aaad0a]
10: (Mutex::Mutex(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool)+0x36e) [0x7f5181aaa9f2]
11: (()+0xd7056) [0x7f51706f6056]
12: (librados::IoCtx::operate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, librados::ObjectReadOperation*, ceph::buffer::list*)+0x69) [0x7f51706bd5d5]
13: (()+0x4c218c) [0x7f5170c9018c]
14: (()+0x25369c) [0x7f5170a2169c]
15: (()+0x1efe44) [0x7f51709bde44]
16: (()+0x1f0f9d) [0x7f51709bef9d]
17: (rbd_config_pool_list()+0x62) [0x7f517089bedb]
18: (()+0x978fb) [0x7f51710db8fb]
19: (()+0x177172) [0x7f5180609172]
20: (()+0x2cd5d) [0x7f5171070d5d]
21: (()+0x2f928) [0x7f5171073928]
22: (_PyObject_FastCallDict()+0x8b) [0x7f518059aefb]
23: (()+0x184c31) [0x7f5180616c31]
24: (_PyEval_EvalFrameDefault()+0x30a) [0x7f518064249a]
25: (()+0x1079af) [0x7f51805999af]
26: (()+0x13dc70) [0x7f51805cfc70]
27: (()+0x184bb2) [0x7f5180616bb2]
28: (_PyEval_EvalFrameDefault()+0x30a) [0x7f518064249a]
29: (()+0x1079af) [0x7f51805999af]
30: (()+0x13dc70) [0x7f51805cfc70]
31: (()+0x184bb2) [0x7f5180616bb2]
32: (_PyEval_EvalFrameDefault()+0x30a) [0x7f518064249a]
33: (PyEval_EvalCodeEx()+0x317) [0x7f5180617bd7]
34: (()+0x186bde) [0x7f5180618bde]
35: (PyObject_Call()+0x43) [0x7f518059b563]
36: (_PyEval_EvalFrameDefault()+0x1a5d) [0x7f5180643bed]
37: (PyEval_EvalCodeEx()+0x317) [0x7f5180617bd7]
38: (()+0x186bde) [0x7f5180618bde]
39: (PyObject_Call()+0x43) [0x7f518059b563]
40: (_PyEval_EvalFrameDefault()+0x1a5d) [0x7f5180643bed]
41: (()+0x1079af) [0x7f51805999af]
42: (_PyFunction_FastCallDict()+0x53d) [0x7f518059abed]
43: (_PyObject_FastCallDict()+0x31e) [0x7f518059b18e]
44: (_PyObject_Call_Prepend()+0x60) [0x7f51805a6600]
45: (PyObject_Call()+0x43) [0x7f518059b563]
46: (_PyEval_EvalFrameDefault()+0x1a5d) [0x7f5180643bed]
47: (_PyFunction_FastCallDict()+0x12a) [0x7f518059a7da]
48: (_PyObject_FastCallDict()+0x31e) [0x7f518059b18e]
49: (_PyObject_Call_Prepend()+0x60) [0x7f51805a6600]
50: (PyObject_Call()+0x43) [0x7f518059b563]
51: (()+0x17ce25) [0x7f518060ee25]
52: (PyObject_Call()+0x43) [0x7f518059b563]
53: (_PyEval_EvalFrameDefault()+0x1a5d) [0x7f5180643bed]
54: (PyEval_EvalCodeEx()+0x317) [0x7f5180617bd7]
55: (()+0x186bde) [0x7f5180618bde]
56: (PyObject_Call()+0x43) [0x7f518059b563]
57: (_PyEval_EvalFrameDefault()+0x1a5d) [0x7f5180643bed]
58: (PyEval_EvalCodeEx()+0x317) [0x7f5180617bd7]
59: (()+0x186bde) [0x7f5180618bde]
60: (PyObject_Call()+0x43) [0x7f518059b563]
61: (_PyEval_EvalFrameDefault()+0x1a5d) [0x7f5180643bed]
62: (()+0x1079af) [0x7f51805999af]
63: (_PyFunction_FastCallDict()+0x318) [0x7f518059a9c8]
64: (_PyObject_FastCallDict()+0x31e) [0x7f518059b18e]
65: (_PyObject_Call_Prepend()+0x60) [0x7f51805a6600]
66: (PyObject_Call()+0x43) [0x7f518059b563]
67: (()+0x17ce25) [0x7f518060ee25]
68: (_PyObject_FastCallDict()+0x8b) [0x7f518059aefb]
69: (()+0x184c31) [0x7f5180616c31]
70: (_PyEval_EvalFrameDefault()+0x30a) [0x7f518064249a]
71: (()+0x13daba) [0x7f51805cfaba]
72: (()+0x184bb2) [0x7f5180616bb2]
73: (_PyEval_EvalFrameDefault()+0x30a) [0x7f518064249a]
74: (()+0x13daba) [0x7f51805cfaba]
75: (()+0x184bb2) [0x7f5180616bb2]
76: (_PyEval_EvalFrameDefault()+0x30a) [0x7f518064249a]
77: (()+0x13daba) [0x7f51805cfaba]
78: (()+0x184bb2) [0x7f5180616bb2]
79: (_PyEval_EvalFrameDefault()+0x30a) [0x7f518064249a]
80: (_PyFunction_FastCallDict()+0x12a) [0x7f518059a7da]
81: (_PyObject_FastCallDict()+0x31e) [0x7f518059b18e]
82: (_PyObject_Call_Prepend()+0x60) [0x7f51805a6600]
83: (PyObject_Call()+0x43) [0x7f518059b563]
84: (()+0x17b2a6) [0x7f518060d2a6]
85: (()+0x177172) [0x7f5180609172]
86: (_PyObject_FastCallDict()+0x8b) [0x7f518059aefb]
87: (()+0x184c31) [0x7f5180616c31]
88: (_PyEval_EvalFrameDefault()+0x30a) [0x7f518064249a]
89: (()+0x13daba) [0x7f51805cfaba]
90: (()+0x184bb2) [0x7f5180616bb2]
91: (_PyEval_EvalFrameDefault()+0x30a) [0x7f518064249a]
92: (_PyFunction_FastCallDict()+0x12a) [0x7f518059a7da]
93: (_PyObject_FastCallDict()+0x31e) [0x7f518059b18e]
94: (_PyObject_Call_Prepend()+0x60) [0x7f51805a6600]
95: (PyObject_Call()+0x43) [0x7f518059b563]
96: (()+0x17ce25) [0x7f518060ee25]
97: (PyObject_Call()+0x43) [0x7f518059b563]
98: (_PyEval_EvalFrameDefault()+0x1a5d) [0x7f5180643bed]
99: (()+0x1079af) [0x7f51805999af]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
 0/ 5 none
 0/ 1 lockdep
 0/ 1 context
 1/ 1 crush
 1/ 5 mds
 1/ 5 mds_balancer
 1/ 5 mds_locker
 1/ 5 mds_log
 1/ 5 mds_log_expire
 1/ 5 mds_migrator
 0/ 1 buffer
 0/ 1 timer
 0/ 1 filer
 0/ 1 striper
 0/ 1 objecter
 0/ 5 rados
 0/ 5 rbd
 0/ 5 rbd_mirror
 0/ 5 rbd_replay
 0/ 5 journaler
 0/ 5 objectcacher
 0/ 5 client
 1/ 5 osd
 0/ 5 optracker
 0/ 5 objclass
 1/ 3 filestore
 1/ 3 journal
 1/ 1 ms
20/20 mon
20/20 monc
 1/ 5 paxos
 0/ 5 tp
 1/ 5 auth
 1/ 5 crypto
 1/ 1 finisher
 1/ 1 reserver
 1/ 5 heartbeatmap
 1/ 5 perfcounter
 1/ 5 rgw
 1/ 5 rgw_sync
 1/10 civetweb
 1/ 5 javaclient
 1/ 5 asok
 1/ 1 throttle
 0/ 0 refs
 1/ 5 xio
 1/ 5 compressor
 1/ 5 bluestore
 1/ 5 bluefs
 1/ 3 bdev
 1/ 5 kstore
 4/ 5 rocksdb
 4/ 5 leveldb
 4/ 5 memdb
 1/ 5 kinetic
 1/ 5 fuse
20/20 mgr
 1/ 5 mgrc
 1/ 5 dpdk
 1/ 5 eventtrace
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent     10000
max_new         1000
log_file /ceph/build/out/mgr.x.log
--- end dump of recent events ---

#6 Updated by Mykola Golub about 5 years ago

Volker Theile wrote:

Found out the problem. There is indeed a dead-lock situation. PyModuleRegistry::active_start() get's the lock first and finally PyModuleRegistry::module_exists() want's to get it.

I am not very familiar with this part of mgr, but it seems to me it could be fixed by changing `ActivePyModules::start_one` to send `active_module->load` and `active_module->thread.create` calls to finisher queue and just return (see `ActivePyModules::notify_all` as an example).

Volker, do you want try this and push a PR so others could review this approach? Or I can do this otherwise.

#7 Updated by Volker Theile about 5 years ago

Mykola, please do it.

I've opened https://github.com/ceph/ceph/pull/26092 in the meanwhile that reverts changes done in my mentioned PR. This is an alternative approach to fix the issue. We can then decide which solution is better.

The code path that is responsible for the dead-lock will not be executed anymore. The disadvantage of the solution is that it duplicates code and now BaseMgrModule::ceph_get_module_option() and BaseMgrModule::ceph_get_module_option_ex() have different implementations.

#8 Updated by Volker Theile about 5 years ago

I'm wondering why this issue did not always occur. I can not remember that i run into such a deadlock while working on #37722 and #36488.

#9 Updated by Mykola Golub about 5 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 26112

PR: https://github.com/ceph/ceph/pull/26112 (load modules in finisher to avoid lock cycles)

#10 Updated by Ernesto Puerta about 5 years ago

  • Parent task set to #38094

#11 Updated by Sage Weil about 5 years ago

  • Status changed from Fix Under Review to Resolved

#12 Updated by Ernesto Puerta about 5 years ago

  • Subject changed from mgr: ceph-mgr crashing due to "recursive lock of PyModuleRegistry::lock" to mgr: abort, dashboard, "recursive lock of PyModuleRegistry::lock"

Also available in: Atom PDF