Project

General

Profile

Bug #38573

mgr/ActivePyModule.cc: 54: FAILED ceph_assert(pClassInstance != nullptr)

Added by Sage Weil about 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
ceph-mgr
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
nautilus, mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -36> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module orchestrator_cli
   -35> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joining module progress
   -34> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module progress
   -33> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joining module prometheus
   -32> 2019-03-04 16:59:02.641 7f2b1e84e700  4 mgr[prometheus] Engine stopped.
   -31> 2019-03-04 16:59:02.641 7f2b1e84e700 20 mgr ~Gil Destroying new thread state 0x5b14dc0
   -30> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module prometheus
   -29> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joining module status
   -28> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module status
   -27> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joining module telemetry
   -26> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module telemetry
   -25> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joining module volumes
   -24> 2019-03-04 16:59:02.641 7f2b29d59700 10 mgr shutdown joined module volumes
   -23> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
   -22> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
   -21> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
   -20> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
   -19> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
   -18> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
   -17> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
   -16> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
   -15> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
   -14> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
   -13> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
   -12> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
   -11> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
   -10> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
    -9> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
    -8> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
    -7> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
    -6> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
    -5> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
    -4> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
    -3> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr Gil Switched to new thread state 0x5b14a50
    -2> 2019-03-04 16:59:02.641 7f2b29d59700 20 mgr ~Gil Destroying new thread state 0x5b14a50
    -1> 2019-03-04 16:59:02.645 7f2b29d59700 -1 /build/ceph-14.1.0-101-gdddb858/src/mgr/ActivePyModule.cc: In function 'void ActivePyModule::notify(const string&, const string&)' thread 7f2b29d59700 time 2019-03-04 16:59:02.646002
/build/ceph-14.1.0-101-gdddb858/src/mgr/ActivePyModule.cc: 54: FAILED ceph_assert(pClassInstance != nullptr)

 ceph version 14.1.0-101-gdddb858 (dddb858f5d5b4fe14a902d8e963beaed3fe2b381) nautilus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f2b40708002]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f2b407081dd]
 3: (ActivePyModule::notify(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x474) [0x500454]
 4: (FunctionContext::finish(int)+0x29) [0x5116c9]
 5: (Context::complete(int)+0x9) [0x50e4e9]
 6: (Finisher::finisher_thread_entry()+0x16e) [0x7f2b4074f85e]
 7: (()+0x76ba) [0x7f2b3fa126ba]
 8: (clone()+0x6d) [0x7f2b3f23b41d]

     0> 2019-03-04 16:59:02.645 7f2b29d59700 -1 *** Caught signal (Aborted) **
 in thread 7f2b29d59700 thread_name:mgr-fin

 ceph version 14.1.0-101-gdddb858 (dddb858f5d5b4fe14a902d8e963beaed3fe2b381) nautilus (dev)
 1: (()+0x11390) [0x7f2b3fa1c390]
 2: (gsignal()+0x38) [0x7f2b3f169428]

Related issues

Related to Dashboard - Bug #42744: mgr/dashboard: Executing the run-backend-api-tests script results in infinite loop Resolved
Duplicated by mgr - Bug #41171: mimic: ceph-mgr 13.2.6 crashing on ubuntu 18.04 lts: ActivePyModule.cc: 54: FAILED assert(pClassInstance != nullptr) Duplicate 08/08/2019
Duplicated by mgr - Bug #35902: mgr:FAILED assert(pClassInstance != nullptr) Duplicate 09/10/2018

History

#1 Updated by Sage Weil almost 5 years ago

the shutdown is happening as a finisher event, and the notify event asserting is another finisher event that is queued after it

also, lots of other things are qeueued via the finisher (config_notify, notify_clog, start_one, cli command invocation, ...)... it's not just this notify assert that matters.

i suspect the most correct fix sets a flag that we are in a shutdown state, preventing any subsequent events from being queued after the shutdown event.

#2 Updated by Sebastian Wagner over 4 years ago

  • Duplicated by Bug #41171: mimic: ceph-mgr 13.2.6 crashing on ubuntu 18.04 lts: ActivePyModule.cc: 54: FAILED assert(pClassInstance != nullptr) added

#3 Updated by Sebastian Wagner over 4 years ago

  • Duplicated by Bug #35902: mgr:FAILED assert(pClassInstance != nullptr) added

#4 Updated by Sebastian Wagner over 4 years ago

log from the other issue:

2019-08-08 10:51:49.389 7fb03e113700 -1 received  signal: Terminated from /sbin/init  (PID: 1) UID: 0
2019-08-08 10:51:50.433 7fb03e113700 -1 mgr handle_signal *** Got signal Terminated ***
2019-08-08 10:51:52.297 7fb026169700 -1 /build/ceph-13.2.6/src/mgr/ActivePyModule.cc: In function 'void ActivePyModule::notify(const string&, const string&)' thread 7fb026169700 time 2019-08-08 10:51:52.302522
/build/ceph-13.2.6/src/mgr/ActivePyModule.cc: 54: FAILED assert(pClassInstance != nullptr)

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7fb0484e5b5e]
 2: (()+0x2c4cb7) [0x7fb0484e5cb7]
 3: (ActivePyModule::notify(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x234) [0x561af448b3b4]
 4: (FunctionContext::finish(int)+0x2c) [0x561af4447d5c]
 5: (Context::complete(int)+0x9) [0x561af44439a9]
 6: (Finisher::finisher_thread_entry()+0x135) [0x7fb0484e40a5]
 7: (()+0x76db) [0x7fb04781c6db]
 8: (clone()+0x3f) [0x7fb046a0288f]

#5 Updated by Sebastian Wagner over 4 years ago

  • Category set to ceph-mgr
  • Source set to Development
  • Backport set to nautilus, mimic
  • Affected Versions v12.2.4, v13.2.6 added

#6 Updated by Kefu Chai over 4 years ago

  • Priority changed from Urgent to High

#7 Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New

#8 Updated by Sage Weil over 4 years ago

  • Related to Bug #42744: mgr/dashboard: Executing the run-backend-api-tests script results in infinite loop added

#9 Updated by Sage Weil over 4 years ago

  • Status changed from New to Resolved

i think this was related to https://tracker.ceph.com/issues/42744 .. probably just harder to hit before patrick's changes?

we could try to backport https://github.com/ceph/ceph/pull/31620 to mimic, but meh, this is very rare and on shutdown anyway.

Also available in: Atom PDF