Project

General

Profile

Bug #45574

subinterpreters: ceph/mgr/rook RuntimeError on import of RookOrchestrator - ceph cluster does not start

Added by Martin Millnert over 2 years ago. Updated over 1 year ago.

Status:
New
Priority:
High
Assignee:
-
Category:
ceph-mgr
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The 'devicehealth' plugins' dependency on rook (package ceph-mgr-rook) code causes a cluster to not boot, after upgrade to mimic v15.2.1 on Debian Buster.
Seems apparently to be due to interaction between Rook code and python3-numpy (version 1:1.16.2-1), and not unique to Rook ( https://github.com/numpy/numpy/issues/14384 ).
Fix not available upstream, seems "WON'T IMPLEMENT", so fix required in Rook.

May 17 21:01:42 davinci ceph-mgr[82022]: 2020-05-17T21:01:42.587+0200 7f262360ff40 -1 mgr[py] Module not found: 'rook'
May 17 21:01:42 davinci ceph-mgr[82022]: 2020-05-17T21:01:42.587+0200 7f262360ff40 -1 mgr[py] Traceback (most recent call last):
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/share/ceph/mgr/rook/__init__.py", line 2, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     from .module import RookOrchestrator
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/share/ceph/mgr/rook/module.py", line 16, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     from kubernetes import client, config
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/lib/python3/dist-packages/kubernetes/__init__.py", line 22, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     import kubernetes.stream
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/lib/python3/dist-packages/kubernetes/stream/__init__.py", line 15, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     from .stream import stream
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/lib/python3/dist-packages/kubernetes/stream/stream.py", line 13, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     from . import ws_client
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/lib/python3/dist-packages/kubernetes/stream/ws_client.py", line 19, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     from websocket import WebSocket, ABNF, enableTrace
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/lib/python3/dist-packages/websocket/__init__.py", line 22, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     from ._abnf import *
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/lib/python3/dist-packages/websocket/_abnf.py", line 34, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     import numpy
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/lib/python3/dist-packages/numpy/__init__.py", line 142, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     from . import core
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/lib/python3/dist-packages/numpy/core/__init__.py", line 40, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     from . import multiarray
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/lib/python3/dist-packages/numpy/core/multiarray.py", line 12, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     from . import overrides
May 17 21:01:42 davinci ceph-mgr[82022]:   File "/usr/lib/python3/dist-packages/numpy/core/overrides.py", line 46, in <module>
May 17 21:01:42 davinci ceph-mgr[82022]:     """)
May 17 21:01:42 davinci ceph-mgr[82022]: RuntimeError: implement_array_function method already has a docstring
May 17 21:01:42 davinci ceph-mgr[82022]: 2020-05-17T21:01:42.591+0200 7f262360ff40 -1 mgr[py] Class not found in module 'rook'
May 17 21:01:42 davinci ceph-mgr[82022]: 2020-05-17T21:01:42.591+0200 7f262360ff40 -1 mgr[py] Error loading module 'rook': (2) No such file or directory
May 17 21:01:43 davinci ceph-mgr[82022]: 2020-05-17T21:01:43.099+0200 7f262360ff40 -1 log_channel(cluster) log [ERR] : Failed to load ceph-mgr modules: rook
May 17 21:01:46 davinci ceph-mgr[82022]: 2020-05-17T21:01:46.211+0200 7f260ae73700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.davinci.lund.millnert.se:
May 17 21:01:46 davinci ceph-mgr[82022]: 2020-05-17T21:01:46.211+0200 7f260ae73700 -1 devicehealth.serve:
May 17 21:01:46 davinci ceph-mgr[82022]: 2020-05-17T21:01:46.211+0200 7f260ae73700 -1 Traceback (most recent call last):
May 17 21:01:46 davinci ceph-mgr[82022]:   File "/usr/share/ceph/mgr/devicehealth/module.py", line 260, in serve
May 17 21:01:46 davinci ceph-mgr[82022]:     self.scrape_all()
May 17 21:01:46 davinci ceph-mgr[82022]:   File "/usr/share/ceph/mgr/devicehealth/module.py", line 327, in scrape_all
May 17 21:01:46 davinci ceph-mgr[82022]:     ioctx = self.open_connection()
May 17 21:01:46 davinci ceph-mgr[82022]:   File "/usr/share/ceph/mgr/devicehealth/module.py", line 297, in open_connection
May 17 21:01:46 davinci ceph-mgr[82022]:     assert r == 0
May 17 21:01:46 davinci ceph-mgr[82022]: AssertionError


Related issues

Related to mgr - Bug #38407: Funny issues with python sub-interpreters Can't reproduce
Related to mgr - Bug #48787: ceph-mgr segfault New
Duplicated by Orchestrator - Bug #50979: rook: implement_array_function method already has a docstring Duplicate
Copied to mgr - Bug #51240: mgr module fails in focal, due to ceph-mgr-rook module Pending Backport

History

#1 Updated by Martin Millnert over 2 years ago

Update: Red herring that this bug prevented cluster from starting.

I was doing upgrade from Luminous through Nautilus to Octopus and had blocked OSDs from starting by the ceph osd require-osd-release = luminous which didn't let n-2 version start. Cluster is up and happy, but that error still shows up nonetheless.

Clearly severity can be reduced.

#3 Updated by Sebastian Wagner over 2 years ago

  • Status changed from New to Need More Info

I think we're going to need the full mgr log here.

#4 Updated by Tim Serong over 2 years ago

Sebastian Wagner wrote:

https://github.com/numpy/numpy/issues/14384#issuecomment-626832460

And we have sub-interpreters again.

There's a couple of subsequent comments on that bug from rgommers, saying that even though they're not going to attempt to fix subinterpreter issues in numpy, that there should be a fix for this particular docstring error "soon". So we may have a brief reprieve...

#5 Updated by Sebastian Wagner over 2 years ago

but that in turn means, we're not able to load numpy from two sub-interpreters. Which means no k8sevents module?

#6 Updated by Sebastian Wagner over 2 years ago

  • Project changed from Orchestrator to mgr
  • Subject changed from ceph/mgr/rook RuntimeError on import of RookOrchestrator - ceph cluster does not start to subinterpreters: ceph/mgr/rook RuntimeError on import of RookOrchestrator - ceph cluster does not start
  • Category changed from mgr/rook to ceph-mgr

#7 Updated by Sebastian Wagner over 1 year ago

  • Duplicated by Bug #50979: rook: implement_array_function method already has a docstring added

#8 Updated by Sebastian Wagner over 1 year ago

  • Status changed from Need More Info to New
  • Priority changed from Normal to High

#9 Updated by Sebastian Wagner over 1 year ago

2021-05-24T19:40:05.822+0000 7f56cd302040 -1 mgr[py] Module not found: 'rook'
2021-05-24T19:40:05.826+0000 7f8800ece040 -1 mgr[py] Module not found: 'rook'
2021-05-24T19:40:05.826+0000 7f8800ece040 -1 mgr[py] Traceback (most recent call last):
  File "/usr/share/ceph/mgr/rook/__init__.py", line 2, in <module>
    from .module import RookOrchestrator
  File "/usr/share/ceph/mgr/rook/module.py", line 17, in <module>
    from kubernetes import client, config
  File "/usr/lib/python3/dist-packages/kubernetes/__init__.py", line 22, in <module>
    import kubernetes.stream
  File "/usr/lib/python3/dist-packages/kubernetes/stream/__init__.py", line 15, in <module>
    from .stream import stream
  File "/usr/lib/python3/dist-packages/kubernetes/stream/stream.py", line 13, in <module>
    from . import ws_client
  File "/usr/lib/python3/dist-packages/kubernetes/stream/ws_client.py", line 19, in <module>
    from websocket import WebSocket, ABNF, enableTrace
  File "/usr/lib/python3/dist-packages/websocket/__init__.py", line 22, in <module>
    from ._abnf import *
  File "/usr/lib/python3/dist-packages/websocket/_abnf.py", line 34, in <module>
    import numpy
  File "/usr/lib/python3/dist-packages/numpy/__init__.py", line 142, in <module>
    from . import core
  File "/usr/lib/python3/dist-packages/numpy/core/__init__.py", line 17, in <module>
    from . import multiarray
  File "/usr/lib/python3/dist-packages/numpy/core/multiarray.py", line 14, in <module>
    from . import overrides
  File "/usr/lib/python3/dist-packages/numpy/core/overrides.py", line 16, in <module>
    add_docstring(
RuntimeError: implement_array_function method already has a docstring

#10 Updated by Sebastian Wagner over 1 year ago

  • Related to Bug #38407: Funny issues with python sub-interpreters added

#11 Updated by Sebastian Wagner over 1 year ago

#12 Updated by Deepika Upadhyay over 1 year ago

Not digged deep but might be helpful, I was taking a look at the issue yesterday, this issue has a workaround in numpy <= 1.19 [0]I checked on Ubuntu 20.04, pip can provide `numpy==1.19`, since have seen it only on focal so far in Octopus; can we just pip install instead of using distro package?

[0] [[https://github.com/numpy/numpy/issues/14384#issuecomment-641340591]]

#13 Updated by Sebastian Wagner over 1 year ago

Deepika Upadhyay wrote:

Not digged deep but might be helpful, I was taking a look at the issue yesterday, this issue has a workaround in numpy <= 1.19 [0]I checked on Ubuntu 20.04, pip can provide `numpy==1.19`, since have seen it only on focal so far in Octopus; can we just pip install instead of using distro package?

[0] https://github.com/numpy/numpy/issues/14384#issuecomment-641340591

I think numpy is installed via APT instead of pip:

https://github.com/ceph/ceph/blob/26fbbefa827cfab0837296df2c8f5d1cc88331ae/debian/control#L319

We can avoid that for now, if we don't install ceph-mgr-rook on focal. but that only works, till other distributions upgrade mypy to > 1.19

#14 Updated by Kefu Chai over 1 year ago

https://github.com/ceph/ceph/pull/41688 is created so we don't install ceph-mgr-rook because it is Recommented by ceph-mgr-modules-core.

#15 Updated by Deepika Upadhyay over 1 year ago

  • Copied to Bug #51240: mgr module fails in focal, due to ceph-mgr-rook module added

Also available in: Atom PDF