Bug #45399: NFS Ganesha : Error searching service specs for all nodes after nfs orch apply nfs...(Cephadm) - Orchestrator - Ceph

Actions

Copy link

Bug #45399

closed

NFS Ganesha : Error searching service specs for all nodes after nfs orch apply nfs...(Cephadm)

Added by Selyan Ferry almost 4 years ago. Updated almost 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Michael Fritch

Category:

cephadm

Target version:

% Done:

Source:

Tags:

NFS

Backport:

octopus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.0.0

ceph-qa-suite:

Pull request ID:

35340

Crash signature (v1):

Crash signature (v2):

Description

Environment :

- 3 hypervisors centos 8.1 (hyp00, hyp01, hyp02)
- 19 OSDs.
- cluster upgraded a month ago from nautilus with the new orchestrator (Cephadm adopt...)
- new cluster with all services running a 15.2.1 container.
- NFS ganesha config loaded in RADOS, shares created with the Ceph dashboard (URL set to rados://nfs-ganesha/ganesha..)

When creating a new nfs (Ganesha) service with Cephadm: "ceph orch apply nfs ganesha nfs-ganesha ganesha"
with

- nfs-ganesha : dedicated pool
- ganesha : dedicated namespace for this new service

The result is :
---------------

[root@admin ~]# ceph orch apply nfs ganesha nfs-ganesha ganesha
Scheduled nfs update...

[root@admin ~]# ceph orch ls
NAME         RUNNING  REFRESHED  AGE  PLACEMENT  IMAGE NAME                             IMAGE ID      
mds.ISOs         2/2  8m ago     8d   count:2    ns01.int.intra:5000/ceph/ceph:v15.2.1  bc83a388465f  
mds.cephfs       3/3  8m ago     4w   count:3    ns01.int.intra:5000/ceph/ceph:v15.2.1  bc83a388465f  
mgr              3/3  8m ago     4w   count:3    ns01.int.intra:5000/ceph/ceph:v15.2.1  bc83a388465f  
mon              3/0  8m ago     -    <no spec>  ns01.int.intra:5000/ceph/ceph:v15.2.1  bc83a388465f  
nfs.ganesha      0/1  -          -    count:1    <unknown>                              <unknown>

And the following error in debug mode :

2020-05-06T13:42:07.199969+0200 mgr.hyp00 [INF] Saving service nfs.ganesha spec with placement count:1
2020-05-06T13:42:07.233173+0200 mgr.hyp00 [DBG] _kick_serve_loop
...
2020-05-06T13:42:07.238257+0200 mgr.hyp00 [DBG] Applying service nfs.ganesha spec
2020-05-06T13:42:07.238443+0200 mgr.hyp00 [DBG] place 1 over all hosts: [HostPlacementSpec(hostname='hyp00.int.intra', network='', name=''), HostPlacementSpec(hostname='hyp02.int.intra', network='', name=''), HostPlacementSpec(hostname='hyp01.int.intra', network='', name='')]
2020-05-06T13:42:07.238608+0200 mgr.hyp00 [DBG] Combine hosts with existing daemons [] + new hosts [HostPlacementSpec(hostname='hyp01.int.intra', network='', name='')]
2020-05-06T13:42:07.238723+0200 mgr.hyp00 [DBG] hosts with daemons: set()
2020-05-06T13:42:07.238842+0200 mgr.hyp00 [INF] Saving service nfs.ganesha spec with placement count:1
2020-05-06T13:42:07.257828+0200 mgr.hyp00 [DBG] Placing nfs.ganesha.hyp01 on host hyp01.int.intra
2020-05-06T13:42:07.258444+0200 mgr.hyp00 [DBG] SpecStore: find spec for nfs.ganesha.hyp01 returned: []
2020-05-06T13:42:07.258980+0200 mgr.hyp00 [WRN] Failed to apply nfs.ganesha spec NFSServiceSpec({'placement': PlacementSpec(count=1), 'service_type': 'nfs', 'service_id': 'ganesha', 'unmanaged': False, 'pool': 'nfs-ganesha', 'namespace': 'ganesha'}): Cannot find service spec nfs.ganesha.hyp01

When I add 3 daemons explicitly on the three nodes, the cluster forks three NFS containers (the nfs.ganesha service has found the service specs for the dedicated nodes), but the same error appear in the logs ("Cannot find service spec nfs.ganesha.hyp00.hyp00", and "nfs.ganesha.hyp01.hyp01, nfs.ganesha.hyp02.hyp02)

What's wrong ? documentation misunderstanding or bug ?

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

Project changed from Ceph to Orchestrator
Category changed from documentation to cephadm
Assignee set to Michael Fritch
Target version deleted (~~v15.2.1~~)
Tags changed from NFS cephadm to NFS
Backport set to octopus

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Michael Fritch almost 4 years ago

Tracker changed from Documentation to Bug
Regression set to No
Severity set to 3 - minor
Pull request ID set to 35340

Actions

Copy link

Updated by Michael Fritch almost 4 years ago

Status changed from New to Fix Under Review

A short hostname should not contain a dot char ('.'), but looks like the validation is missing during `host add` ...

# hostname -s
host3.site

# ceph orch host add node3.site 10.20.98.203
..

# ceph orch host ls
HOST        ADDR          LABELS  STATUS  
host1       host1                      
host3.site  10.20.98.203

Actions

Copy link

Updated by Michael Fritch almost 4 years ago

This also affects other services such as MDS etc, by causing the orchestrator to deploy/remove them in a loop:

2020-05-28T12:30:11.827-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Deploying daemon mds.a.node3.lnslvs on node3.site
..
2020-05-28T12:31:23.302-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Removing orphan daemon mds.a.node3.lnslvs...
2020-05-28T12:31:23.302-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Removing daemon mds.a.node3.lnslvs from node3.site
..
2020-05-28T12:31:51.338-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Deploying daemon mds.a.node3.qttzdu on node3.site
..
2020-05-28T12:32:37.174-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Removing orphan daemon mds.a.node3.qttzdu...
2020-05-28T12:32:37.174-0600 7f6c55f9d700  0 log_channel(cephadm) log [INF] : Removing daemon mds.a.node3.qttzdu from node3.site

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

Status changed from Fix Under Review to Resolved
Affected Versions v15.0.0 added
Affected Versions deleted (~~v15.2.1~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Orchestrator

Custom queries

Bug #45399

NFS Ganesha : Error searching service specs for all nodes after nfs orch apply nfs...(Cephadm)

Updated by Sebastian Wagner almost 4 years ago

Updated by Sebastian Wagner almost 4 years ago

Updated by Michael Fritch almost 4 years ago

Updated by Michael Fritch almost 4 years ago

Updated by Michael Fritch almost 4 years ago

Updated by Sebastian Wagner almost 4 years ago