Project

General

Profile

Bug #55605

Rook orchestrator py exception with NFS commands

Added by Blaine Gardner 9 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Category:
mgr/rook
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
backport_processed
Backport:
quincy
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Some `ceph nfs` commands are failing with the Quincy v17.2.0 release of Ceph using the Rook orchestrator module. (Of note, this does not affect the case where the orchestrator backend is not set.)

To reproduce:
- Install Rook operator, CephCluster, CephFilesystem, and CephNFS from Rook examples
- ceph mgr module enable rook
- ceph orch set backend rook

(Tested using: operator.yaml, cluster-test.yaml, filesystem-test.yaml, nfs-test.yaml on Minikube)

Examples of command failures:

$ ceph nfs cluster ls
Error EPERM: 'pool'
$ ceph nfs export ls my-nfs
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1703, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 433, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/nfs/module.py", line 75, in _cmd_nfs_export_ls
    return self.export_mgr.list_exports(cluster_id=cluster_id, detailed=detailed)
  File "/usr/share/ceph/mgr/nfs/export.py", line 60, in cluster_check
    clusters = known_cluster_ids(export.mgr)
  File "/usr/share/ceph/mgr/nfs/export.py", line 45, in known_cluster_ids
    clusters = set(available_clusters(mgr))
  File "/usr/share/ceph/mgr/nfs/utils.py", line 39, in available_clusters
    orchestrator.raise_if_exception(completion)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 228, in raise_if_exception
    raise e
KeyError: 'pool'

P.S., v17.2.0 was not available as an option for "affected versions".


Related issues

Copied to Orchestrator - Backport #57601: quincy: Rook orchestrator py exception with NFS commands Resolved

History

#1 Updated by Blaine Gardner 9 months ago

John Mulligan identified a bit of code in the orchestrator that is likely related:

this is stab in the dark but there's only one block in rook/module.py that uses pool as a key:

        if service_type == 'nfs' or service_type is None:
            # CephNFSes
            all_nfs = self.rook_cluster.get_resource("cephnfses")
            nfs_pods = self.rook_cluster.describe_pods('nfs', None, None)
            for nfs in all_nfs:
                if nfs['spec']['rados']['pool'] != NFS_POOL_NAME:
                    continue
                nfs_name = nfs['metadata']['name']
                svc = 'nfs.' + nfs_name
                if svc in spec:
                    continue

In Ceph v16.2.7, the NFS orchestration code was changed to assume the pool was hardcoded to `.nfs`, so Rook was updated to make the `rados` block optional. The lines of code searching for the `rados` block and `pool` within that are no longer valid, but they appear to be working to ignore legacy CephNFS definitions. If the orchestrator still wishes to ignore those legacy CephNFSes, the line of code should not assume `rados` or `rados.pool` are present.

#2 Updated by Blaine Gardner 9 months ago

I can also confirm that this issue is a regression in v17.2.0. The problem is not present in the latest development version of v16 which is now a week or two from release of v16.2.8. I tested with the image `quay.io/ceph/daemon-base:master-c7639748-pacific-centos-stream8-x86_64`.

#3 Updated by John Mulligan 9 months ago

https://github.com/ceph/ceph/pull/43046

I think the above PR is what added code to the rook orch module that triggers the exception.
Based on a discussion in the orchestration weekly, rook doesn't require (or now removes) the pool field from the 'rados' subsections of the CR.

The line

if nfs['spec']['rados']['pool'] != NFS_POOL_NAME:

is either unnecessary or needs to be updated to skip the check if 'pool' is not present. IMO.

#4 Updated by John Mulligan 9 months ago

  • Assignee set to Juan Miguel Olmo Martínez

#5 Updated by Juan Miguel Olmo Martínez 9 months ago

  • Pull request ID set to 46321

#6 Updated by Blaine Gardner 6 months ago

Can this get merged and backported to Quincy for the next release?

#7 Updated by Adam King 5 months ago

  • Status changed from New to Pending Backport
  • Backport set to quincy

#8 Updated by Backport Bot 5 months ago

  • Copied to Backport #57601: quincy: Rook orchestrator py exception with NFS commands added

#9 Updated by Backport Bot 5 months ago

  • Tags set to backport_processed

#10 Updated by Adam King 5 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF