Project

General

Profile

Actions

Bug #59438

closed

nfs module crash in Kubernetes

Added by Thomas Way about 1 year ago. Updated 3 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Please see https://github.com/rook/rook/issues/12073 for full context. I am not using NFS at all, but the module is crashing and puts the cluster into a HEALTH_WARN state.

I'll also post the crash dumps here:

```
❯ k n rook-ceph exec -it rook-ceph-tools-b84c4448f-gw9s9 - bash
bash-4.4$ ceph crash ls-new
ID ENTITY NEW
2023-03-30T01:12:34.517911Z_5e48e564-c76e-4c62-b61e-2de60e3c46f5 mgr.a *
2023-04-07T16:21:56.316462Z_e2022d37-1381-4b1b-8560-e82a17451c54 mgr.b *
2023-04-08T07:33:36.872560Z_8881ab4f-fc4d-4179-998e-6c2179a345df client.admin *
2023-04-09T00:30:23.831170Z_dfd019e2-76e1-4bea-97a8-b4682ad0b2e6 mgr.a *
2023-04-09T00:30:25.370631Z_d027dc35-8850-4e03-94b8-92df025a2185 mgr.a *
2023-04-09T09:17:57.013003Z_59853afe-31d2-49cb-826f-020b96e3f207 mgr.a *
bash-4.4$ ceph crash info 2023-03-30T01:12:34.517911Z_5e48e564-c76e-4c62-b61e-2de60e3c46f5 {
"backtrace": [
" File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n return available_clusters(self)",
" File \"/usr/share/ceph/mgr/nfs/utils.py\", line 39, in available_clusters\n orchestrator.raise_if_exception(completion)",
" File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 228, in raise_if_exception\n raise e",
"kubernetes.client.rest.ApiException: ({'type': 'ERROR', 'object': {'api_version': 'v1',\n 'kind': 'Status',\n 'metadata': {'annotations': None,\n 'cluster_name': None,\n 'creation_timestamp': None,\n 'deletion_grace_period_seconds': None,\n 'deletion_timestamp': None,\n 'finalizers': None,\n 'generate_name': None,\n 'generation': None,\n 'initializers': None,\n 'labels': None,\n 'managed_fields': None,\n 'name': None,\n 'namespace': None,\n 'owner_references': None,\n 'resource_version': None,\n 'self_link': None,\n 'uid': None},\n 'spec': None,\n 'status': {'conditions': None,\n 'container_statuses': None,\n 'host_ip': None,\n 'init_container_statuses': None,\n 'message': None,\n 'nominated_node_name': None,\n 'phase': None,\n 'pod_ip': None,\n 'qos_class': None,\n 'reason': None,\n 'start_time': None}}, 'raw_object': {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 2283932 (2286915)', 'reason': 'Expired', 'code': 410}})\nReason: None\n"
],
"ceph_version": "17.2.5",
"crash_id": "2023-03-30T01:12:34.517911Z_5e48e564-c76e-4c62-b61e-2de60e3c46f5",
"entity_name": "mgr.a",
"mgr_module": "nfs",
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
"mgr_python_exception": "ApiException",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "23edd4ebc3c422aa5ba68c773c3c1185d12f713fde4143de596da1c5da580a10",
"timestamp": "2023-03-30T01:12:34.517911Z",
"utsname_hostname": "rook-ceph-mgr-a-54f94bcc7f-56lx6",
"utsname_machine": "x86_64",
"utsname_release": "5.15.102-talos",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Mon Mar 13 18:10:38 UTC 2023"
}
bash-4.4$ ceph crash info 2023-04-07T16:21:56.316462Z_e2022d37-1381-4b1b-8560-e82a17451c54 {
"backtrace": [
" File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n return available_clusters(self)",
" File \"/usr/share/ceph/mgr/nfs/utils.py\", line 39, in available_clusters\n orchestrator.raise_if_exception(completion)",
" File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 228, in raise_if_exception\n raise e",
"KeyError: 'exporter'"
],
"ceph_version": "17.2.5",
"crash_id": "2023-04-07T16:21:56.316462Z_e2022d37-1381-4b1b-8560-e82a17451c54",
"entity_name": "mgr.b",
"mgr_module": "nfs",
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
"mgr_python_exception": "KeyError",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "9c1924794bfbc226c50751f5594c9a24347eb7821d109b38756ed1b0b237bd13",
"timestamp": "2023-04-07T16:21:56.316462Z",
"utsname_hostname": "rook-ceph-mgr-b-646b988f6b-8km28",
"utsname_machine": "x86_64",
"utsname_release": "6.2.9-talos",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Apr 4 20:09:57 UTC 2023"
}
bash-4.4$ ceph crash info 2023-04-09T00:30:23.831170Z_dfd019e2-76e1-4bea-97a8-b4682ad0b2e6 {
"backtrace": [
" File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n return available_clusters(self)",
" File \"/usr/share/ceph/mgr/nfs/utils.py\", line 39, in available_clusters\n orchestrator.raise_if_exception(completion)",
" File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 228, in raise_if_exception\n raise e",
"KeyError: 'exporter'"
],
"ceph_version": "17.2.5",
"crash_id": "2023-04-09T00:30:23.831170Z_dfd019e2-76e1-4bea-97a8-b4682ad0b2e6",
"entity_name": "mgr.a",
"mgr_module": "nfs",
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
"mgr_python_exception": "KeyError",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "9c1924794bfbc226c50751f5594c9a24347eb7821d109b38756ed1b0b237bd13",
"timestamp": "2023-04-09T00:30:23.831170Z",
"utsname_hostname": "rook-ceph-mgr-a-669b56d6fb-dwhrz",
"utsname_machine": "x86_64",
"utsname_release": "5.15.102-talos",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Mon Mar 13 18:10:38 UTC 2023"
}
bash-4.4$ ceph crash info 2023-04-09T00:30:25.370631Z_d027dc35-8850-4e03-94b8-92df025a2185 {
"backtrace": [
" File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n return available_clusters(self)",
" File \"/usr/share/ceph/mgr/nfs/utils.py\", line 39, in available_clusters\n orchestrator.raise_if_exception(completion)",
" File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 228, in raise_if_exception\n raise e",
"KeyError: 'exporter'"
],
"ceph_version": "17.2.5",
"crash_id": "2023-04-09T00:30:25.370631Z_d027dc35-8850-4e03-94b8-92df025a2185",
"entity_name": "mgr.a",
"mgr_module": "nfs",
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
"mgr_python_exception": "KeyError",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "9c1924794bfbc226c50751f5594c9a24347eb7821d109b38756ed1b0b237bd13",
"timestamp": "2023-04-09T00:30:25.370631Z",
"utsname_hostname": "rook-ceph-mgr-a-669b56d6fb-dwhrz",
"utsname_machine": "x86_64",
"utsname_release": "5.15.102-talos",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Mon Mar 13 18:10:38 UTC 2023"
}
bash-4.4$ ceph crash info 2023-04-09T09:17:57.013003Z_59853afe-31d2-49cb-826f-020b96e3f207 {
"backtrace": [
" File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n return available_clusters(self)",
" File \"/usr/share/ceph/mgr/nfs/utils.py\", line 39, in available_clusters\n orchestrator.raise_if_exception(completion)",
" File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 228, in raise_if_exception\n raise e",
"kubernetes.client.rest.ApiException: ({'type': 'ERROR', 'object': {'api_version': 'v1',\n 'kind': 'Status',\n 'metadata': {'annotations': None,\n 'cluster_name': None,\n 'creation_timestamp': None,\n 'deletion_grace_period_seconds': None,\n 'deletion_timestamp': None,\n 'finalizers': None,\n 'generate_name': None,\n 'generation': None,\n 'initializers': None,\n 'labels': None,\n 'managed_fields': None,\n 'name': None,\n 'namespace': None,\n 'owner_references': None,\n 'resource_version': None,\n 'self_link': None,\n 'uid': None},\n 'spec': None,\n 'status': {'conditions': None,\n 'container_statuses': None,\n 'host_ip': None,\n 'init_container_statuses': None,\n 'message': None,\n 'nominated_node_name': None,\n 'phase': None,\n 'pod_ip': None,\n 'qos_class': None,\n 'reason': None,\n 'start_time': None}}, 'raw_object': {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 5088161 (5088315)', 'reason': 'Expired', 'code': 410}})\nReason: None\n"
],
"ceph_version": "17.2.5",
"crash_id": "2023-04-09T09:17:57.013003Z_59853afe-31d2-49cb-826f-020b96e3f207",
"entity_name": "mgr.a",
"mgr_module": "nfs",
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
"mgr_python_exception": "ApiException",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "14c2f47cb38f90c4d870d85c74884f7d56b7efbf23df6050a2bfb4dabc75c0ac",
"timestamp": "2023-04-09T09:17:57.013003Z",
"utsname_hostname": "rook-ceph-mgr-a-669b56d6fb-dwhrz",
"utsname_machine": "x86_64",
"utsname_release": "5.15.102-talos",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Mon Mar 13 18:10:38 UTC 2023"
}
```


Related issues 1 (1 open0 closed)

Related to mgr - Bug #56246: crash: File "mgr/nfs/module.py", in cluster_ls: return available_clusters(self)In ProgressPonnuvel P

Actions
Actions

Also available in: Atom PDF