Project

General

Profile

Actions

Bug #64213

open

MGR modules incompatible with later PyO3 versions - PyO3 modules may only be initialized once per interpreter process

Added by Chris Palmer 3 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
build
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Many MGR modules cannot be used on platforms with later versions of PyO3. The error message

PyO3 modules may only be initialized once per interpreter process

is issued in many places, including (but not limited to) the dashboard, and any TLS communication.

This occurs in Debian 12 (bookworm), but has been noted in other distributions too.

An example crash is:

$ ceph crash info 2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52 {
"backtrace": [
" File \"/usr/share/ceph/mgr/cephadm/__init__.py\", line 1, in <module>\n from .module import CephadmOrchestrator",
" File \"/usr/share/ceph/mgr/cephadm/module.py\", line 15, in <module>\n from cephadm.service_discovery import ServiceDiscovery",
" File \"/usr/share/ceph/mgr/cephadm/service_discovery.py\", line 20, in <module>\n from cephadm.ssl_cert_utils import SSLCerts",
" File \"/usr/share/ceph/mgr/cephadm/ssl_cert_utils.py\", line 8, in <module>\n from cryptography import x509",
" File \"/lib/python3/dist-packages/cryptography/x509/__init__.py\", line 6, in <module>\n from cryptography.x509 import certificate_transparency",
" File \"/lib/python3/dist-packages/cryptography/x509/certificate_transparency.py\", line 10, in <module>\n from cryptography.hazmat.bindings._rust import x509 as rust_x509",
"ImportError: PyO3 modules may only be initialized once per interpreter process"
],
"ceph_version": "18.2.1",
"crash_id": "2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52",
"entity_name": "mgr.xxxxx01",
"mgr_module": "cephadm",
"mgr_module_caller": "PyModule::load_subclass_of",
"mgr_python_exception": "ImportError",
"os_id": "12",
"os_name": "Debian GNU/Linux 12 (bookworm)",
"os_version": "12 (bookworm)",
"os_version_id": "12",
"process_name": "ceph-mgr",
"stack_sig": "7815ad73ced094695056319d1241bf7847da19b4b0dfee7a216407b59a7e3d84",
"timestamp": "2024-01-12T11:10:03.938478Z",
"utsname_hostname": "xxxxx01.xxx.xxx",
"utsname_machine": "x86_64",
"utsname_release": "6.1.0-17-amd64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30)"
}

My understanding of the relevant background is:

  • MGR modules use python subinterpreters for isolation between modules.
  • Several modules (including but not limited to dashboard & restful) use python3-cryptography for hashing and TLS (and possibly other things).
  • python3-cryptography delegates some crypto functions to Rust functions. These include bcrypt and TLS-related functions.
  • python3-cryptography uses PyO3 to invoke Rust functions.
  • PyO3 does not support being used by subinterpreters. In the past this has been allowed but was actually unsafe. Now PyO3 throws an exception when it detects multiple initialisations.

So it appears that the MGR use of these functions has always been unsafe, and is now forbidden.

PR54710 identified that the code necessary for the bcrypt hashing used during authentication could easily be written in a small amount of native python, thus avoiding the whole PyO3 area altogether.

However there was a note in the discussions that you also had to disable TLS. And it only applied to the dashboard. My stacktrace above shows the exception during TLS initialisation.

As PyO3 updates are adopted in other linux distributions and containers this is likely to break a number of MGR modules. As there does not seem to be any subinterpreter support in PyO3 coming soon, the only option may be to completely eliminate use of python3-cryptopgraphy from all MGR modules. (It is possible MGR modules may also use other python3 modules that use PyO3 to invoke Rust).

Unfortunately for us, we didn't find this until we had upgraded all MONs in a cluster to reef, at which point we can't downgrade them to quincy. And we can't upgrade the MGR. As a temporary measure (this cluster had MON/MGR/MDS/RGW colocated on 2 hosts) we have added another bookworm host running a reef MON to ensure we can maintain quorum. We are not sure whether it is safe to upgrade the other components (OSD, MDS, RGW) while the MGR remains at quincy,

This has been discussed on ceph-users, and the postings contain links to several other sources of information. Only Problem 2 referred to on that thread is relevant to this bug:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VEN3IU53ZVU343S3U25QKKPCOER4X7AG/#53P4RTSCPPHQYIEISBBAEJXQJNEQSWYL
Actions #1

Updated by Matthew Vernon 2 months ago

https://tracker.ceph.com/issues/63529 is a subset of this issue (relating to the dashboard), and has a fix just for the dashboard committed to main.

Actions #2

Updated by Peter Razumovsky 2 months ago

centos 9stream is also affected btw, we are affected by this pyo3 import error issue. Subscribing on this issue.

Actions #3

Updated by Chris Palmer 2 months ago

Interesting... Because of this problem, and the fact that debian-ceph packages are not even tested before release, I am in the middle of moving 3 ceph clusters from debian11/quincy to centos-9-stream/reef. I've done 2/3 clusters so far without encountering this problem. But the only modules I am really using are dashboard and restful, and we do have TLS enabled.

Do you have any more information about which modules are causing the it on centos9? And which version of ceph you are using?

Actions

Also available in: Atom PDF