Bug #45399
closedNFS Ganesha : Error searching service specs for all nodes after nfs orch apply nfs...(Cephadm)
0%
Description
Environment :
- 3 hypervisors centos 8.1 (hyp00, hyp01, hyp02)
- 19 OSDs.
- cluster upgraded a month ago from nautilus with the new orchestrator (Cephadm adopt...)
- new cluster with all services running a 15.2.1 container.
- NFS ganesha config loaded in RADOS, shares created with the Ceph dashboard (URL set to rados://nfs-ganesha/ganesha..)
When creating a new nfs (Ganesha) service with Cephadm: "ceph orch apply nfs ganesha nfs-ganesha ganesha"
with
- nfs-ganesha : dedicated pool
- ganesha : dedicated namespace for this new service
The result is :
---------------
[root@admin ~]# ceph orch apply nfs ganesha nfs-ganesha ganesha Scheduled nfs update... [root@admin ~]# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID mds.ISOs 2/2 8m ago 8d count:2 ns01.int.intra:5000/ceph/ceph:v15.2.1 bc83a388465f mds.cephfs 3/3 8m ago 4w count:3 ns01.int.intra:5000/ceph/ceph:v15.2.1 bc83a388465f mgr 3/3 8m ago 4w count:3 ns01.int.intra:5000/ceph/ceph:v15.2.1 bc83a388465f mon 3/0 8m ago - <no spec> ns01.int.intra:5000/ceph/ceph:v15.2.1 bc83a388465f nfs.ganesha 0/1 - - count:1 <unknown> <unknown>
And the following error in debug mode :
2020-05-06T13:42:07.199969+0200 mgr.hyp00 [INF] Saving service nfs.ganesha spec with placement count:1 2020-05-06T13:42:07.233173+0200 mgr.hyp00 [DBG] _kick_serve_loop ... 2020-05-06T13:42:07.238257+0200 mgr.hyp00 [DBG] Applying service nfs.ganesha spec 2020-05-06T13:42:07.238443+0200 mgr.hyp00 [DBG] place 1 over all hosts: [HostPlacementSpec(hostname='hyp00.int.intra', network='', name=''), HostPlacementSpec(hostname='hyp02.int.intra', network='', name=''), HostPlacementSpec(hostname='hyp01.int.intra', network='', name='')] 2020-05-06T13:42:07.238608+0200 mgr.hyp00 [DBG] Combine hosts with existing daemons [] + new hosts [HostPlacementSpec(hostname='hyp01.int.intra', network='', name='')] 2020-05-06T13:42:07.238723+0200 mgr.hyp00 [DBG] hosts with daemons: set() 2020-05-06T13:42:07.238842+0200 mgr.hyp00 [INF] Saving service nfs.ganesha spec with placement count:1 2020-05-06T13:42:07.257828+0200 mgr.hyp00 [DBG] Placing nfs.ganesha.hyp01 on host hyp01.int.intra 2020-05-06T13:42:07.258444+0200 mgr.hyp00 [DBG] SpecStore: find spec for nfs.ganesha.hyp01 returned: [] 2020-05-06T13:42:07.258980+0200 mgr.hyp00 [WRN] Failed to apply nfs.ganesha spec NFSServiceSpec({'placement': PlacementSpec(count=1), 'service_type': 'nfs', 'service_id': 'ganesha', 'unmanaged': False, 'pool': 'nfs-ganesha', 'namespace': 'ganesha'}): Cannot find service spec nfs.ganesha.hyp01
When I add 3 daemons explicitly on the three nodes, the cluster forks three NFS containers (the nfs.ganesha service has found the service specs for the dedicated nodes), but the same error appear in the logs ("Cannot find service spec nfs.ganesha.hyp00.hyp00", and "nfs.ganesha.hyp01.hyp01, nfs.ganesha.hyp02.hyp02)
What's wrong ? documentation misunderstanding or bug ?
Updated by Sebastian Wagner almost 4 years ago
- Project changed from Ceph to Orchestrator
- Category changed from documentation to cephadm
- Assignee set to Michael Fritch
- Target version deleted (
v15.2.1) - Tags changed from NFS cephadm to NFS
- Backport set to octopus
Updated by Michael Fritch almost 4 years ago
- Tracker changed from Documentation to Bug
- Regression set to No
- Severity set to 3 - minor
- Pull request ID set to 35340
Updated by Michael Fritch almost 4 years ago
- Status changed from New to Fix Under Review
A short hostname should not contain a dot char ('.'), but looks like the validation is missing during `host add` ...
# hostname -s host3.site # ceph orch host add node3.site 10.20.98.203 .. # ceph orch host ls HOST ADDR LABELS STATUS host1 host1 host3.site 10.20.98.203
Updated by Michael Fritch almost 4 years ago
This also affects other services such as MDS etc, by causing the orchestrator to deploy/remove them in a loop:
2020-05-28T12:30:11.827-0600 7f6c55f9d700 0 log_channel(cephadm) log [INF] : Deploying daemon mds.a.node3.lnslvs on node3.site .. 2020-05-28T12:31:23.302-0600 7f6c55f9d700 0 log_channel(cephadm) log [INF] : Removing orphan daemon mds.a.node3.lnslvs... 2020-05-28T12:31:23.302-0600 7f6c55f9d700 0 log_channel(cephadm) log [INF] : Removing daemon mds.a.node3.lnslvs from node3.site .. 2020-05-28T12:31:51.338-0600 7f6c55f9d700 0 log_channel(cephadm) log [INF] : Deploying daemon mds.a.node3.qttzdu on node3.site .. 2020-05-28T12:32:37.174-0600 7f6c55f9d700 0 log_channel(cephadm) log [INF] : Removing orphan daemon mds.a.node3.qttzdu... 2020-05-28T12:32:37.174-0600 7f6c55f9d700 0 log_channel(cephadm) log [INF] : Removing daemon mds.a.node3.qttzdu from node3.site
Updated by Sebastian Wagner almost 4 years ago
- Status changed from Fix Under Review to Resolved
- Affected Versions v15.0.0 added
- Affected Versions deleted (
v15.2.1)