Project

General

Profile

Actions

Bug #59438

closed

nfs module crash in Kubernetes

Added by Thomas Way about 1 year ago. Updated 2 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Please see https://github.com/rook/rook/issues/12073 for full context. I am not using NFS at all, but the module is crashing and puts the cluster into a HEALTH_WARN state.

I'll also post the crash dumps here:

```
❯ k n rook-ceph exec -it rook-ceph-tools-b84c4448f-gw9s9 - bash
bash-4.4$ ceph crash ls-new
ID ENTITY NEW
2023-03-30T01:12:34.517911Z_5e48e564-c76e-4c62-b61e-2de60e3c46f5 mgr.a *
2023-04-07T16:21:56.316462Z_e2022d37-1381-4b1b-8560-e82a17451c54 mgr.b *
2023-04-08T07:33:36.872560Z_8881ab4f-fc4d-4179-998e-6c2179a345df client.admin *
2023-04-09T00:30:23.831170Z_dfd019e2-76e1-4bea-97a8-b4682ad0b2e6 mgr.a *
2023-04-09T00:30:25.370631Z_d027dc35-8850-4e03-94b8-92df025a2185 mgr.a *
2023-04-09T09:17:57.013003Z_59853afe-31d2-49cb-826f-020b96e3f207 mgr.a *
bash-4.4$ ceph crash info 2023-03-30T01:12:34.517911Z_5e48e564-c76e-4c62-b61e-2de60e3c46f5 {
"backtrace": [
" File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n return available_clusters(self)",
" File \"/usr/share/ceph/mgr/nfs/utils.py\", line 39, in available_clusters\n orchestrator.raise_if_exception(completion)",
" File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 228, in raise_if_exception\n raise e",
"kubernetes.client.rest.ApiException: ({'type': 'ERROR', 'object': {'api_version': 'v1',\n 'kind': 'Status',\n 'metadata': {'annotations': None,\n 'cluster_name': None,\n 'creation_timestamp': None,\n 'deletion_grace_period_seconds': None,\n 'deletion_timestamp': None,\n 'finalizers': None,\n 'generate_name': None,\n 'generation': None,\n 'initializers': None,\n 'labels': None,\n 'managed_fields': None,\n 'name': None,\n 'namespace': None,\n 'owner_references': None,\n 'resource_version': None,\n 'self_link': None,\n 'uid': None},\n 'spec': None,\n 'status': {'conditions': None,\n 'container_statuses': None,\n 'host_ip': None,\n 'init_container_statuses': None,\n 'message': None,\n 'nominated_node_name': None,\n 'phase': None,\n 'pod_ip': None,\n 'qos_class': None,\n 'reason': None,\n 'start_time': None}}, 'raw_object': {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 2283932 (2286915)', 'reason': 'Expired', 'code': 410}})\nReason: None\n"
],
"ceph_version": "17.2.5",
"crash_id": "2023-03-30T01:12:34.517911Z_5e48e564-c76e-4c62-b61e-2de60e3c46f5",
"entity_name": "mgr.a",
"mgr_module": "nfs",
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
"mgr_python_exception": "ApiException",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "23edd4ebc3c422aa5ba68c773c3c1185d12f713fde4143de596da1c5da580a10",
"timestamp": "2023-03-30T01:12:34.517911Z",
"utsname_hostname": "rook-ceph-mgr-a-54f94bcc7f-56lx6",
"utsname_machine": "x86_64",
"utsname_release": "5.15.102-talos",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Mon Mar 13 18:10:38 UTC 2023"
}
bash-4.4$ ceph crash info 2023-04-07T16:21:56.316462Z_e2022d37-1381-4b1b-8560-e82a17451c54 {
"backtrace": [
" File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n return available_clusters(self)",
" File \"/usr/share/ceph/mgr/nfs/utils.py\", line 39, in available_clusters\n orchestrator.raise_if_exception(completion)",
" File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 228, in raise_if_exception\n raise e",
"KeyError: 'exporter'"
],
"ceph_version": "17.2.5",
"crash_id": "2023-04-07T16:21:56.316462Z_e2022d37-1381-4b1b-8560-e82a17451c54",
"entity_name": "mgr.b",
"mgr_module": "nfs",
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
"mgr_python_exception": "KeyError",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "9c1924794bfbc226c50751f5594c9a24347eb7821d109b38756ed1b0b237bd13",
"timestamp": "2023-04-07T16:21:56.316462Z",
"utsname_hostname": "rook-ceph-mgr-b-646b988f6b-8km28",
"utsname_machine": "x86_64",
"utsname_release": "6.2.9-talos",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Apr 4 20:09:57 UTC 2023"
}
bash-4.4$ ceph crash info 2023-04-09T00:30:23.831170Z_dfd019e2-76e1-4bea-97a8-b4682ad0b2e6 {
"backtrace": [
" File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n return available_clusters(self)",
" File \"/usr/share/ceph/mgr/nfs/utils.py\", line 39, in available_clusters\n orchestrator.raise_if_exception(completion)",
" File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 228, in raise_if_exception\n raise e",
"KeyError: 'exporter'"
],
"ceph_version": "17.2.5",
"crash_id": "2023-04-09T00:30:23.831170Z_dfd019e2-76e1-4bea-97a8-b4682ad0b2e6",
"entity_name": "mgr.a",
"mgr_module": "nfs",
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
"mgr_python_exception": "KeyError",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "9c1924794bfbc226c50751f5594c9a24347eb7821d109b38756ed1b0b237bd13",
"timestamp": "2023-04-09T00:30:23.831170Z",
"utsname_hostname": "rook-ceph-mgr-a-669b56d6fb-dwhrz",
"utsname_machine": "x86_64",
"utsname_release": "5.15.102-talos",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Mon Mar 13 18:10:38 UTC 2023"
}
bash-4.4$ ceph crash info 2023-04-09T00:30:25.370631Z_d027dc35-8850-4e03-94b8-92df025a2185 {
"backtrace": [
" File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n return available_clusters(self)",
" File \"/usr/share/ceph/mgr/nfs/utils.py\", line 39, in available_clusters\n orchestrator.raise_if_exception(completion)",
" File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 228, in raise_if_exception\n raise e",
"KeyError: 'exporter'"
],
"ceph_version": "17.2.5",
"crash_id": "2023-04-09T00:30:25.370631Z_d027dc35-8850-4e03-94b8-92df025a2185",
"entity_name": "mgr.a",
"mgr_module": "nfs",
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
"mgr_python_exception": "KeyError",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "9c1924794bfbc226c50751f5594c9a24347eb7821d109b38756ed1b0b237bd13",
"timestamp": "2023-04-09T00:30:25.370631Z",
"utsname_hostname": "rook-ceph-mgr-a-669b56d6fb-dwhrz",
"utsname_machine": "x86_64",
"utsname_release": "5.15.102-talos",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Mon Mar 13 18:10:38 UTC 2023"
}
bash-4.4$ ceph crash info 2023-04-09T09:17:57.013003Z_59853afe-31d2-49cb-826f-020b96e3f207 {
"backtrace": [
" File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n return available_clusters(self)",
" File \"/usr/share/ceph/mgr/nfs/utils.py\", line 39, in available_clusters\n orchestrator.raise_if_exception(completion)",
" File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 228, in raise_if_exception\n raise e",
"kubernetes.client.rest.ApiException: ({'type': 'ERROR', 'object': {'api_version': 'v1',\n 'kind': 'Status',\n 'metadata': {'annotations': None,\n 'cluster_name': None,\n 'creation_timestamp': None,\n 'deletion_grace_period_seconds': None,\n 'deletion_timestamp': None,\n 'finalizers': None,\n 'generate_name': None,\n 'generation': None,\n 'initializers': None,\n 'labels': None,\n 'managed_fields': None,\n 'name': None,\n 'namespace': None,\n 'owner_references': None,\n 'resource_version': None,\n 'self_link': None,\n 'uid': None},\n 'spec': None,\n 'status': {'conditions': None,\n 'container_statuses': None,\n 'host_ip': None,\n 'init_container_statuses': None,\n 'message': None,\n 'nominated_node_name': None,\n 'phase': None,\n 'pod_ip': None,\n 'qos_class': None,\n 'reason': None,\n 'start_time': None}}, 'raw_object': {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 5088161 (5088315)', 'reason': 'Expired', 'code': 410}})\nReason: None\n"
],
"ceph_version": "17.2.5",
"crash_id": "2023-04-09T09:17:57.013003Z_59853afe-31d2-49cb-826f-020b96e3f207",
"entity_name": "mgr.a",
"mgr_module": "nfs",
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
"mgr_python_exception": "ApiException",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "14c2f47cb38f90c4d870d85c74884f7d56b7efbf23df6050a2bfb4dabc75c0ac",
"timestamp": "2023-04-09T09:17:57.013003Z",
"utsname_hostname": "rook-ceph-mgr-a-669b56d6fb-dwhrz",
"utsname_machine": "x86_64",
"utsname_release": "5.15.102-talos",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Mon Mar 13 18:10:38 UTC 2023"
}
```


Related issues 1 (1 open0 closed)

Related to mgr - Bug #56246: crash: File "mgr/nfs/module.py", in cluster_ls: return available_clusters(self)In ProgressPonnuvel P

Actions
Actions #1

Updated by Radoslaw Zarzynski 12 months ago

  • Project changed from mgr to Orchestrator

Moving this to Orchestrator as it seems we're missing a dedicated category for NFS. Would you mind taking a look?

Actions #2

Updated by Ponnuvel P 2 months ago

  • Status changed from New to Duplicate
Actions #3

Updated by Ponnuvel P 2 months ago

The issue is that, when NFS tab is accessed even NFS module isn't enabled, it crashes the modules.

There are several issues reported by Telemetry bot. In of those trackers https://tracker.ceph.com/issues/56246, a fix is being proposed: https://github.com/ceph/ceph/pull/54583

Actions #4

Updated by Ponnuvel P 2 months ago

  • Related to Bug #56246: crash: File "mgr/nfs/module.py", in cluster_ls: return available_clusters(self) added
Actions

Also available in: Atom PDF