Bug #46327: cephadm: nfs daemons share the same config object - Orchestrator - Ceph

Actions

Copy link

Bug #46327

closed

cephadm: nfs daemons share the same config object

Added by Kiefer Chang almost 4 years ago. Updated over 3 years ago.

Status:

Won't Fix

Priority:

Normal

Assignee:

Category:

cephadm/nfs

Target version:

% Done:

Source:

Tags:

Backport:

octopus

Regression:

Yes

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.4

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

If we create a NFS service with multiple instances, those instance share the same rados object as the configuration source.

E.g.

# cat /tmp/nfs.yml
service_type: nfs
service_id: sesdev_nfs_deployment
placement:
    hosts:
        - 'mgr0'
        - 'osd0'
spec:
    pool: rbd
    namespace: nfs

# ceph orch apply -i /tmp/nfs.yml
Scheduled nfs.sesdev_nfs_deployment update...

# rados ls -p rbd --all
2020-07-02T07:48:22.637+0000 7fecad052b80 -1 WARNING: all dangerous and experimental features are enabled.
2020-07-02T07:48:22.637+0000 7fecad052b80 -1 WARNING: all dangerous and experimental features are enabled.
2020-07-02T07:48:22.637+0000 7fecad052b80 -1 WARNING: all dangerous and experimental features are enabled.
nfs    grace
nfs    rec-0000000000000003:nfs.sesdev_nfs_deployment.mgr0
nfs    rec-0000000000000003:nfs.sesdev_nfs_deployment.osd0
nfs    conf-nfs.sesdev_nfs_deployment             <--- this object is shared by all daemons

# podman exec ceph-a83a8c75-ee83-4ada-8881-fc01bdca496b-nfs.sesdev_nfs_deployment.osd0 cat /etc/ganesha/ganesha.conf
<...>
RADOS_URLS {
        UserId = "nfs.sesdev_nfs_deployment.osd0";
        watch_url = "rados://rbd/nfs/conf-nfs.sesdev_nfs_deployment";
}

# podman exec ceph-a83a8c75-ee83-4ada-8881-fc01bdca496b-nfs.sesdev_nfs_deployment.mgr0 cat /etc/ganesha/ganesha.conf
<...>
RADOS_URLS {
        UserId = "nfs.sesdev_nfs_deployment.mgr0";
        watch_url = "rados://rbd/nfs/conf-nfs.sesdev_nfs_deployment";
}

Each daemon should have its own configuration object. Otherwise, all daemons are going to share the same exports.

The Dashboard determines daemon instances by enumerating `conf-xxx` objects. If there is only one config, the user can only choose it (but all daemons share the same config actually).

Files

daemons.png (30.7 KB) daemons.png

Kiefer Chang, 07/02/2020 08:00 AM

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Varsha Rao almost 4 years ago

Status changed from New to Rejected

Not a bug, by design all daemons within a cluster will share the same config object.

Actions

Copy link

Updated by Kiefer Chang almost 4 years ago

This causes a regression in the Dashboard. RADOS objects are designed to work with multiple daemons and each daemon has its won configuration before, please see https://docs.ceph.com/docs/nautilus/mgr/dashboard/#nfs-ganesha-management

I suggest letting each daemon have its own configuration object. If in octopus all daemons share the same config, then this introduces some concerns:
- How to migrate daemons in the previous deployments, if each daemon contains different exports.
- The dashboard's Ganesha feature will be tightly coupled with the Orchestrator, the user that deploys Ganesha daemons with other tools needs to follow orchestrator's way.
- Daemons deployed with Rook doesn't share configs.

The slide that describes the regression: https://docs.google.com/presentation/d/1HwwDPXfAG_AJRLItylRlx_0H-JrFNw9sut7Jpm1QPg4/edit?usp=sharing

Actions

Copy link

Updated by Lenz Grimmer almost 4 years ago

Status changed from Rejected to New
Regression changed from No to Yes

Reopening, as this change introduces a regression and will potentially break upgrades.

Actions

Copy link

Updated by Patrick Donnelly almost 4 years ago

Affected Versions v15.2.4 added
Affected Versions deleted (~~v15.2.5, v16.0.0~~)

Hi Kiefer, I left some similar comments on your slide deck but also will say here:

Kiefer Chang wrote:

This causes a regression in the Dashboard. RADOS objects are designed to work with multiple daemons and each daemon has its won configuration before, please see https://docs.ceph.com/docs/nautilus/mgr/dashboard/#nfs-ganesha-management

I suggest letting each daemon have its own configuration object. If in octopus all daemons share the same config, then this introduces some concerns:
- How to migrate daemons in the previous deployments, if each daemon contains different exports.

"All daemons share the same config" is not correct. We now have NFS-Ganesha clusters. Each cluster shares the same common config and set of exports.

I think the migration path is to convert the dashboard created NFS-Ganesha daemons (Nautilus) to the volumes plugin (src/pybind/mgr/volumes/nfs) to individual NFS-Ganesha clusters. So, 1-1 mapping.

- The dashboard's Ganesha feature will be tightly coupled with the Orchestrator, the user that deploys Ganesha daemons with other tools needs to follow orchestrator's way.
- Daemons deployed with Rook doesn't share configs.

This will soon change in a follow-up fix by Varsha.

The slide that describes the regression: https://docs.google.com/presentation/d/1HwwDPXfAG_AJRLItylRlx_0H-JrFNw9sut7Jpm1QPg4/edit?usp=sharing

Actions

Copy link

Updated by Michael Fritch almost 4 years ago

Patrick Donnelly wrote:

Hi Kiefer, I left some similar comments on your slide deck but also will say here:

I also left a few comments on the slide deck.

Kiefer Chang wrote:

This causes a regression in the Dashboard. RADOS objects are designed to work with multiple daemons and each daemon has its won configuration before, please see https://docs.ceph.com/docs/nautilus/mgr/dashboard/#nfs-ganesha-management

I suggest letting each daemon have its own configuration object. If in octopus all daemons share the same config, then this introduces some concerns:
- How to migrate daemons in the previous deployments, if each daemon contains different exports.

"All daemons share the same config" is not correct. We now have NFS-Ganesha clusters. Each cluster shares the same common config and set of exports.

The "bootstrap" config for the container image differs, but the RADOS common config is shared for the entire NFS service defined by the Orchestrator.

I think the migration path is to convert the dashboard created NFS-Ganesha daemons (Nautilus) to the volumes plugin (src/pybind/mgr/volumes/nfs) to individual NFS-Ganesha clusters. So, 1-1 mapping.

export files can likely remain, but the new common config could simply be a union of each of the per-daemon configs?

also, I think it would make sense to deprecate the dashboard logic in favor of using the logic contained in the volumes plugin ...

- The dashboard's Ganesha feature will be tightly coupled with the Orchestrator, the user that deploys Ganesha daemons with other tools needs to follow orchestrator's way.
- Daemons deployed with Rook doesn't share configs.

This will soon change in a follow-up fix by Varsha.

awesome \o/

Actions

Copy link

Updated by Patrick Donnelly almost 4 years ago

Michael Fritch wrote:

Kiefer Chang wrote:

This causes a regression in the Dashboard. RADOS objects are designed to work with multiple daemons and each daemon has its won configuration before, please see https://docs.ceph.com/docs/nautilus/mgr/dashboard/#nfs-ganesha-management

I suggest letting each daemon have its own configuration object. If in octopus all daemons share the same config, then this introduces some concerns:
- How to migrate daemons in the previous deployments, if each daemon contains different exports.

"All daemons share the same config" is not correct. We now have NFS-Ganesha clusters. Each cluster shares the same common config and set of exports.

The "bootstrap" config for the container image differs, but the RADOS common config is shared for the entire NFS service defined by the Orchestrator.

The RADOS common config is per-NFS cluster. The pool namespace in the nfs-ganesha pool distinguishes the clusters.

I've also asked Varsha to simplify the bootstrap config defined by the orchestrator. I believe we should keep the bootstrap config as simple as possible so higher level abstractions (i.e. the volumes/nfs code) can have better control and the orchestrator's nfs service is more general (can be applied to rgw/rbd).

Actions

Copy link

Updated by Michael Fritch almost 4 years ago

Patrick Donnelly wrote:

Michael Fritch wrote:

Kiefer Chang wrote:

This causes a regression in the Dashboard. RADOS objects are designed to work with multiple daemons and each daemon has its won configuration before, please see https://docs.ceph.com/docs/nautilus/mgr/dashboard/#nfs-ganesha-management

I suggest letting each daemon have its own configuration object. If in octopus all daemons share the same config, then this introduces some concerns:
- How to migrate daemons in the previous deployments, if each daemon contains different exports.

"All daemons share the same config" is not correct. We now have NFS-Ganesha clusters. Each cluster shares the same common config and set of exports.

The "bootstrap" config for the container image differs, but the RADOS common config is shared for the entire NFS service defined by the Orchestrator.

The RADOS common config is per-NFS cluster. The pool namespace in the nfs-ganesha pool distinguishes the clusters.

I've also asked Varsha to simplify the bootstrap config defined by the orchestrator. I believe we should keep the bootstrap config as simple as possible so higher level abstractions (i.e. the volumes/nfs code) can have better control and the orchestrator's nfs service is more general (can be applied to rgw/rbd).