Bug #38573
mgr/ActivePyModule.cc: 54: FAILED ceph_assert(pClassInstance != nullptr)
0%
Description
-36> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module orchestrator_cli -35> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joining module progress -34> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module progress -33> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joining module prometheus -32> 2019-03-04 16:59:02.641 7f2b1e84e700 4 mgr[prometheus] Engine stopped. -31> 2019-03-04 16:59:02.641 7f2b1e84e700 20 mgr ~Gil Destroying new thread state 0x5b14dc0 -30> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module prometheus -29> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joining module status -28> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module status -27> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joining module telemetry -26> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module telemetry -25> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joining module volumes -24> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module volumes -23> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -22> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -21> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -20> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -19> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -18> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -17> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -16> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -15> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -14> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -13> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -12> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -11> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -10> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -9> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -8> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -7> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -6> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -5> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -4> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -3> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50 -2> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50 -1> 2019-03-04 16:59:02.645 7f2b29d59700 -1 /build/ceph-14.1.0-101-gdddb858/src/mgr/ActivePyModule.cc: In function 'void ActivePyModule::notify(const string&, const string&)' thread 7f2b29d59700 time 2019-03-04 16:59:02.646002 /build/ceph-14.1.0-101-gdddb858/src/mgr/ActivePyModule.cc: 54: FAILED ceph_assert(pClassInstance != nullptr) ceph version 14.1.0-101-gdddb858 (dddb858f5d5b4fe14a902d8e963beaed3fe2b381) nautilus (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f2b40708002] 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f2b407081dd] 3: (ActivePyModule::notify(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x474) [0x500454] 4: (FunctionContext::finish(int)+0x29) [0x5116c9] 5: (Context::complete(int)+0x9) [0x50e4e9] 6: (Finisher::finisher_thread_entry()+0x16e) [0x7f2b4074f85e] 7: (()+0x76ba) [0x7f2b3fa126ba] 8: (clone()+0x6d) [0x7f2b3f23b41d] 0> 2019-03-04 16:59:02.645 7f2b29d59700 -1 *** Caught signal (Aborted) ** in thread 7f2b29d59700 thread_name:mgr-fin ceph version 14.1.0-101-gdddb858 (dddb858f5d5b4fe14a902d8e963beaed3fe2b381) nautilus (dev) 1: (()+0x11390) [0x7f2b3fa1c390] 2: (gsignal()+0x38) [0x7f2b3f169428]
Related issues
History
#1 Updated by Sage Weil almost 5 years ago
the shutdown is happening as a finisher event, and the notify event asserting is another finisher event that is queued after it
also, lots of other things are qeueued via the finisher (config_notify, notify_clog, start_one, cli command invocation, ...)... it's not just this notify assert that matters.
i suspect the most correct fix sets a flag that we are in a shutdown state, preventing any subsequent events from being queued after the shutdown event.
#2 Updated by Sebastian Wagner over 4 years ago
- Duplicated by Bug #41171: mimic: ceph-mgr 13.2.6 crashing on ubuntu 18.04 lts: ActivePyModule.cc: 54: FAILED assert(pClassInstance != nullptr) added
#3 Updated by Sebastian Wagner over 4 years ago
- Duplicated by Bug #35902: mgr:FAILED assert(pClassInstance != nullptr) added
#4 Updated by Sebastian Wagner over 4 years ago
log from the other issue:
2019-08-08 10:51:49.389 7fb03e113700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 2019-08-08 10:51:50.433 7fb03e113700 -1 mgr handle_signal *** Got signal Terminated *** 2019-08-08 10:51:52.297 7fb026169700 -1 /build/ceph-13.2.6/src/mgr/ActivePyModule.cc: In function 'void ActivePyModule::notify(const string&, const string&)' thread 7fb026169700 time 2019-08-08 10:51:52.302522 /build/ceph-13.2.6/src/mgr/ActivePyModule.cc: 54: FAILED assert(pClassInstance != nullptr) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7fb0484e5b5e] 2: (()+0x2c4cb7) [0x7fb0484e5cb7] 3: (ActivePyModule::notify(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x234) [0x561af448b3b4] 4: (FunctionContext::finish(int)+0x2c) [0x561af4447d5c] 5: (Context::complete(int)+0x9) [0x561af44439a9] 6: (Finisher::finisher_thread_entry()+0x135) [0x7fb0484e40a5] 7: (()+0x76db) [0x7fb04781c6db] 8: (clone()+0x3f) [0x7fb046a0288f]
#5 Updated by Sebastian Wagner over 4 years ago
- Category set to ceph-mgr
- Source set to Development
- Backport set to nautilus, mimic
- Affected Versions v12.2.4, v13.2.6 added
#6 Updated by Kefu Chai over 4 years ago
- Priority changed from Urgent to High
#7 Updated by Patrick Donnelly over 4 years ago
- Status changed from 12 to New
#8 Updated by Sage Weil over 4 years ago
- Related to Bug #42744: mgr/dashboard: Executing the run-backend-api-tests script results in infinite loop added
#9 Updated by Sage Weil over 4 years ago
- Status changed from New to Resolved
i think this was related to https://tracker.ceph.com/issues/42744 .. probably just harder to hit before patrick's changes?
we could try to backport https://github.com/ceph/ceph/pull/31620 to mimic, but meh, this is very rare and on shutdown anyway.