Project

General

Profile

Actions

Bug #45399

closed

NFS Ganesha : Error searching service specs for all nodes after nfs orch apply nfs...(Cephadm)

Added by Selyan Ferry almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
NFS
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Environment :

- 3 hypervisors centos 8.1 (hyp00, hyp01, hyp02)
- 19 OSDs.
- cluster upgraded a month ago from nautilus with the new orchestrator (Cephadm adopt...)
- new cluster with all services running a 15.2.1 container.
- NFS ganesha config loaded in RADOS, shares created with the Ceph dashboard (URL set to rados://nfs-ganesha/ganesha..)

When creating a new nfs (Ganesha) service with Cephadm: "ceph orch apply nfs ganesha nfs-ganesha ganesha"
with

- nfs-ganesha : dedicated pool
- ganesha : dedicated namespace for this new service

The result is :
---------------

[root@admin ~]# ceph orch apply nfs ganesha nfs-ganesha ganesha
Scheduled nfs update...

[root@admin ~]# ceph orch ls
NAME         RUNNING  REFRESHED  AGE  PLACEMENT  IMAGE NAME                             IMAGE ID      
mds.ISOs         2/2  8m ago     8d   count:2    ns01.int.intra:5000/ceph/ceph:v15.2.1  bc83a388465f  
mds.cephfs       3/3  8m ago     4w   count:3    ns01.int.intra:5000/ceph/ceph:v15.2.1  bc83a388465f  
mgr              3/3  8m ago     4w   count:3    ns01.int.intra:5000/ceph/ceph:v15.2.1  bc83a388465f  
mon              3/0  8m ago     -    <no spec>  ns01.int.intra:5000/ceph/ceph:v15.2.1  bc83a388465f  
nfs.ganesha      0/1  -          -    count:1    <unknown>                              <unknown>   

And the following error in debug mode :

2020-05-06T13:42:07.199969+0200 mgr.hyp00 [INF] Saving service nfs.ganesha spec with placement count:1
2020-05-06T13:42:07.233173+0200 mgr.hyp00 [DBG] _kick_serve_loop
...
2020-05-06T13:42:07.238257+0200 mgr.hyp00 [DBG] Applying service nfs.ganesha spec
2020-05-06T13:42:07.238443+0200 mgr.hyp00 [DBG] place 1 over all hosts: [HostPlacementSpec(hostname='hyp00.int.intra', network='', name=''), HostPlacementSpec(hostname='hyp02.int.intra', network='', name=''), HostPlacementSpec(hostname='hyp01.int.intra', network='', name='')]
2020-05-06T13:42:07.238608+0200 mgr.hyp00 [DBG] Combine hosts with existing daemons [] + new hosts [HostPlacementSpec(hostname='hyp01.int.intra', network='', name='')]
2020-05-06T13:42:07.238723+0200 mgr.hyp00 [DBG] hosts with daemons: set()
2020-05-06T13:42:07.238842+0200 mgr.hyp00 [INF] Saving service nfs.ganesha spec with placement count:1
2020-05-06T13:42:07.257828+0200 mgr.hyp00 [DBG] Placing nfs.ganesha.hyp01 on host hyp01.int.intra
2020-05-06T13:42:07.258444+0200 mgr.hyp00 [DBG] SpecStore: find spec for nfs.ganesha.hyp01 returned: []
2020-05-06T13:42:07.258980+0200 mgr.hyp00 [WRN] Failed to apply nfs.ganesha spec NFSServiceSpec({'placement': PlacementSpec(count=1), 'service_type': 'nfs', 'service_id': 'ganesha', 'unmanaged': False, 'pool': 'nfs-ganesha', 'namespace': 'ganesha'}): Cannot find service spec nfs.ganesha.hyp01

When I add 3 daemons explicitly on the three nodes, the cluster forks three NFS containers (the nfs.ganesha service has found the service specs for the dedicated nodes), but the same error appear in the logs ("Cannot find service spec nfs.ganesha.hyp00.hyp00", and "nfs.ganesha.hyp01.hyp01, nfs.ganesha.hyp02.hyp02)

What's wrong ? documentation misunderstanding or bug ?

Actions #1

Updated by Sebastian Wagner almost 4 years ago

  • Project changed from Ceph to Orchestrator
  • Category changed from documentation to cephadm
  • Assignee set to Michael Fritch
  • Target version deleted (v15.2.1)
  • Tags changed from NFS cephadm to NFS
  • Backport set to octopus
Actions #2

Updated by Sebastian Wagner almost 4 years ago

  • Description updated (diff)
Actions #3

Updated by Michael Fritch almost 4 years ago

  • Tracker changed from Documentation to Bug
  • Regression set to No
  • Severity set to 3 - minor
  • Pull request ID set to 35340
Actions #4

Updated by Michael Fritch almost 4 years ago

  • Status changed from New to Fix Under Review

A short hostname should not contain a dot char ('.'), but looks like the validation is missing during `host add` ...

# hostname -s
host3.site

# ceph orch host add node3.site 10.20.98.203
..

# ceph orch host ls
HOST        ADDR          LABELS  STATUS  
host1       host1                      
host3.site  10.20.98.203         
Actions #5

Updated by Michael Fritch almost 4 years ago

This also affects other services such as MDS etc, by causing the orchestrator to deploy/remove them in a loop:

2020-05-28T12:30:11.827-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Deploying daemon mds.a.node3.lnslvs on node3.site
..
2020-05-28T12:31:23.302-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Removing orphan daemon mds.a.node3.lnslvs...
2020-05-28T12:31:23.302-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Removing daemon mds.a.node3.lnslvs from node3.site
..
2020-05-28T12:31:51.338-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Deploying daemon mds.a.node3.qttzdu on node3.site
..
2020-05-28T12:32:37.174-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Removing orphan daemon mds.a.node3.qttzdu...
2020-05-28T12:32:37.174-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Removing daemon mds.a.node3.qttzdu from node3.site
Actions #6

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from Fix Under Review to Resolved
  • Affected Versions v15.0.0 added
  • Affected Versions deleted (v15.2.1)
Actions

Also available in: Atom PDF