Project

General

Profile

Bug #41736

ceph-mgr crashes with "Failed to run module in active mode ('rbd_support')" after upgrade from 14.2.2 -> 14.2.

Added by Gavin Baker 9 days ago. Updated 2 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
09/10/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

After doing a yum update and service restart of a Ceph cluster, manager services crash and fail to restart. Main error appears to be: "mgr operator() Failed to run module in active mode ('rbd_support')".

Sep 9 19:13:28 ceph-mgr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 9 19:13:28 ceph-mgr: -235> 2019-09-09 19:13:28.427 7fbbed24a700 -1 mgr load Failed to construct class in 'rbd_support'
Sep 9 19:13:28 ceph-mgr: -218> 2019-09-09 19:13:28.427 7fbbed24a700 -1 mgr load Traceback (most recent call last):
Sep 9 19:13:28 ceph-mgr: File "/usr/share/ceph/mgr/rbd_support/module.py", line 1326, in init
Sep 9 19:13:28 ceph-mgr: self.task = TaskHandler(self)
Sep 9 19:13:28 ceph-mgr: File "/usr/share/ceph/mgr/rbd_support/module.py", line 610, in init
Sep 9 19:13:28 ceph-mgr: self.init_task_queue()
Sep 9 19:13:28 ceph-mgr: File "/usr/share/ceph/mgr/rbd_support/module.py", line 674, in init_task_queue
Sep 9 19:13:28 ceph-mgr: self.load_task_queue(ioctx, pool_name)
Sep 9 19:13:28 ceph-mgr: File "/usr/share/ceph/mgr/rbd_support/module.py", line 708, in load_task_queue
Sep 9 19:13:28 ceph-mgr: ioctx.operate_read_op(read_op, RBD_TASK_OID)
Sep 9 19:13:28 ceph-mgr: File "rados.pyx", line 516, in rados.requires.wrapper.validate_func (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.3/rpm/el7/BUILD/ceph-14.2.3/build/src/pybind/rados/pyrex/rados.c:4721)
Sep 9 19:13:28 ceph-mgr: File "rados.pyx", line 3474, in rados.Ioctx.operate_read_op (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.3/rpm/el7/BUILD/ceph-14.2.3/build/src/pybind/rados/pyrex/rados.c:36554)
Sep 9 19:13:28 ceph-mgr: PermissionError: [errno 1] Failed to operate read op for oid rbd_task
Sep 9 19:13:28 ceph-mgr: -217> 2019-09-09 19:13:28.583 7fbbed24a700 -1 mgr operator() Failed to run module in active mode ('rbd_support')
Sep 9 19:13:28 ceph-mgr: -128> 2019-09-09 19:13:28.590 7fbbed24a700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.3/rpm/el7/BUILD/ceph-14.2.3/src/mgr/ActivePyModule.cc: In function 'void ActivePyModule::notify(const string&, const string&)' thread 7fbbed24a700 time 2019-09-09 19:13:28.590091
Sep 9 19:13:28 ceph-mgr: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.3/rpm/el7/BUILD/ceph-14.2.3/src/mgr/ActivePyModule.cc: 54: FAILED ceph_assert(pClassInstance != nullptr)
Sep 9 19:13:28 ceph-mgr: ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39) nautilus (stable)
Sep 9 19:13:28 ceph-mgr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7fbc0e38eac2]
Sep 9 19:13:28 ceph-mgr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7fbc0e38ec90]
Sep 9 19:13:28 ceph-mgr: 3: (ActivePyModule::notify(std::string const&, std::string const&)+0x4f5) [0x56043aea69f5]
Sep 9 19:13:28 ceph-mgr: 4: (FunctionContext::finish(int)+0x2c) [0x56043aeb8eac]
Sep 9 19:13:28 ceph-mgr: 5: (Context::complete(int)+0x9) [0x56043aeb5659]
Sep 9 19:13:28 ceph-mgr: 6: (Finisher::finisher_thread_entry()+0x156) [0x7fbc0e3d5cc6]
Sep 9 19:13:28 ceph-mgr: 7: (()+0x7dd5) [0x7fbc0bc8ddd5]
Sep 9 19:13:28 ceph-mgr: 8: (clone()+0x6d) [0x7fbc0a93702d]
Sep 9 19:13:28 ceph-mgr: -106> 2019-09-09 19:13:28.591 7fbbed24a700 -1 ** Caught signal (Aborted) *
Sep 9 19:13:28 ceph-mgr: in thread 7fbbed24a700 thread_name:mgr-fin
Sep 9 19:13:28 ceph-mgr: ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39) nautilus (stable)
Sep 9 19:13:28 ceph-mgr: 1: (()+0xf5d0) [0x7fbc0bc955d0]
Sep 9 19:13:28 ceph-mgr: 2: (gsignal()+0x37) [0x7fbc0a86f2c7]
Sep 9 19:13:28 ceph-mgr: 3: (abort()+0x148) [0x7fbc0a8709b8]
Sep 9 19:13:28 ceph-mgr: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7fbc0e38eb11]
Sep 9 19:13:28 ceph-mgr: 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7fbc0e38ec90]
Sep 9 19:13:28 ceph-mgr: 6: (ActivePyModule::notify(std::string const&, std::string const&)+0x4f5) [0x56043aea69f5]
Sep 9 19:13:28 ceph-mgr: 7: (FunctionContext::finish(int)+0x2c) [0x56043aeb8eac]
Sep 9 19:13:28 ceph-mgr: 8: (Context::complete(int)+0x9) [0x56043aeb5659]
Sep 9 19:13:28 ceph-mgr: 9: (Finisher::finisher_thread_entry()+0x156) [0x7fbc0e3d5cc6]
Sep 9 19:13:28 ceph-mgr: 10: (()+0x7dd5) [0x7fbc0bc8ddd5]
Sep 9 19:13:28 ceph-mgr: 11: (clone()+0x6d) [0x7fbc0a93702d]

History

#1 Updated by Gavin Baker 9 days ago

Removing a number of old conf options seems to have enabled the mgr service to start. However the ceph status command outputs the following error:

health: HEALTH_ERR
Module 'rbd_support' has failed: Not found or unloadable

Config options that were removed:

mgr                            advanced mgr/balancer/active                true                                                                                                     
mgr advanced mgr/balancer/mode crush-compat
mgr advanced mgr/balancer/pool_ids 24,1,2,20,14,22,23
mgr advanced mgr/devicehealth/enable_monitoring false

#2 Updated by Greg Farnum 2 days ago

  • Project changed from Ceph to mgr

Also available in: Atom PDF