Bug #62297
opennfs cluster commands returns Error EINVAL: 'exporter'
0%
Description
TL;DR: nfs commands like `ceph nfs cluster ls` fail in rook orchestrated cluster; backtrace is full of orch and rook code references in mgr log¶
I orchestrated a cluster using rook (I use yamls from [0]), I tried out a pretty trivial command `ceph nfs cluster ls` for which i got an error:
sh-4.4$ ceph -v
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
sh-4.4$ ceph nfs cluster ls
Error EINVAL: 'exporter'
I use these YAMLs to create a cluster:
crds.yaml, common.yaml, operator.yaml, toolbox.yaml, cluster-test.yaml, filesystem-test.yaml, nfs-test.yaml
When i grep for quay images used in these YAMLs:
cluster-test.yaml:32: image: quay.io/ceph/ceph:v18
toolbox.yaml:21: image: quay.io/ceph/ceph:v17.2.6
there is difference in the versions so i thought maybe because it's this discrepancy that might've lead to an issue, but I wanted to confirm it so went ahead and changed the cluster-test.yaml to use v17.2.6 so now both the YAMLs report same version:
cluster-test.yaml:32: image: quay.io/ceph/ceph:v17.2.6
toolbox.yaml:21: image: quay.io/ceph/ceph:v17.2.6
and this time it works, bingo
sh-4.4$ ceph -v
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
sh-4.4$ ceph nfs cluster ls
my-nfs
but i was curious if there is regression with reef release or not so changed the YAMLs to report v18.0:
cluster-test.yaml:32: image: quay.io/ceph/ceph:v18
toolbox.yaml:21: image: quay.io/ceph/ceph:v18
and my suspicion was right
sh-4.4$ ceph -v
ceph version 18.1.3 (f594a0802c34733bb06e5993bc4bdb085c9a5f3f) reef (rc)
sh-4.4$ ceph nfs cluster ls
Error EINVAL: 'exporter'
fails for cluster creation cmd too
sh-4.4$ ceph nfs cluster create cephfs-nfs
Error EINVAL: 'exporter'
So it seems like there is regression in reef, I dug deep and it looks like daemon_type_to_service() in pybind/mgr/orchestrator/_interface.py is provided with a key not present in it's map.
The code flow is _cmd_nfs_cluster_ls() -> available_clusters() -> describe_service() line 420 [1] -> service_name() -> daemon_type_to_service()
I suspect issue to be somewhere around [1].
PS: Attached mgr log with the tracker.
[0] https://github.com/rook/rook/tree/master/deploy/examples
[1] https://github.com/ceph/ceph/blob/main/src/pybind/mgr/rook/module.py#L420
Files