Bug #61852: Ceph NFS "HAProxy_Hosts" configuration issue - Ceph - Ceph

Actions

Copy link

Bug #61852

open

Ceph NFS "HAProxy_Hosts" configuration issue

Added by Goutham Pacha Ravi 11 months ago. Updated 9 months ago.

Status:

Pending Backport

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

backport_processed

Backport:

reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

52410

Crash signature (v1):

Crash signature (v2):

Description

Ceph's Reef release added support for deploying the ceph-nfs service and the ceph-ingress service with a "haproxy-protocol" ingress mode [1]²
There's a config issue that happens when attempting this on ceph nodes that have multiple IP addresses.

NFS cluster creation and some debug info: https://paste.openstack.org/show/bVyG6N1E876PY1G30fZo/

NFS cluster configuration (`/etc/ganesha/ganesha.conf`) is extracted from one of the nfs containers created; it is identical across the nodes.

With this configuration, NFS mounts fail because HAProxy PROXY protocol parsing fails on NFS-Ganesha:

Mount failure on the client:

# mount -vvvt nfs 192.168.130.21:/volumes/_nogroup/399c3d6a-b50d-4e33-87d7-8848c1680e3b/406f33bf-26ab-4e04-aa27-0be027af4d24 /mnt mount.nfs: timeout set for Wed Jun 28 17:29:34 2023 mount.nfs: trying text-based options 'vers=4.2,addr=192.168.130.21,clientaddr=192.168.130.28' mount.nfs: mount(2): Input/output error mount.nfs: mount system call failed

Logs from an NFS-Ganesha service when mount failed: https://paste.openstack.org/show/belF8k02E7HmmFAGhuNN/

[1] https://docs.ceph.com/en/latest/mgr/nfs/#create-nfs-ganesha-cluster
[2] https://github.com/ceph/ceph/pull/50614

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Goutham Pacha Ravi 11 months ago

Problem was first assumed to be an NFS-Ganesha issue; if it's helpful some more conversation here: https://github.com/nfs-ganesha/nfs-ganesha/issues/966

Actions

Copy link

Updated by John Mulligan 11 months ago

NFS cluster creation and some debug info: https://paste.openstack.org/show/bVyG6N1E876PY1G30fZo/

I don't see the corresponding HAProxy configuration file. Could you please share that as well?

NFS cluster configuration (`/etc/ganesha/ganesha.conf`) is extracted from one of the nfs containers created; it is identical across the nodes.

With this configuration, NFS mounts fail because HAProxy PROXY protocol parsing fails on NFS-Ganesha:

I don't think that's quite right. The parsing of the configured IP addresses is OK, and it likely parses the haproxy protocol message OK, the issue is that HAProxy connects to a different (local) IP address than the one in the Ganesha config. Ganesha fails to see the IP in the list of hosts permitted to send proxy protocol info and rejects the connection.

What I need to know is if cephadm/mgr is generating an inconsistent set of configuration files, telling HAProxy to connect to NFS over one IP address while configuring the permitted IP to be something else OR if we're not inconsistent but vague and HAProxy chose a valid but unexpected IP and thus the fix would be to cast a wider net when configuring ganesha.

Thanks!

Actions

Copy link

Updated by Goutham Pacha Ravi 10 months ago

Thanks for the response John. Here's the Haproxy config:

https://paste.openstack.org/show/bC0znJTXl9VYcABqJwb6/

Actions

Copy link

Updated by John Mulligan 10 months ago

Thanks for sharing that file. Adam reproduced it yesterday and we debugged the issue for a bit.
Here's a simplified summary of the problem:

When cephadm configures haproxy it adds a list of back-end servers to the haproxy conf file.
The addresses stored in this file are the "primary" addresses that cephadm associates with
each host.

When we are configured to use the haproxy protocol we put a list of IP addresses that are
permitted to send proxy protocol messages to Ganesha. Remember that these are the addresses
of the clients. In our simple example we would list the following addresses:
- 192.168.12.10
- 192.168.12.11
- 10.2.2.112

Similarly, in our simplified model, we can have just one NFS Ganesha server running.
Thus we would add the following backend servers to the HAProxy config:
- 10.2.2.112

In this simple example we run only one Ganesha server but the principle applies even if
there are more servers set up for the HAProxy backend.

The following block diagram shows how a NFS connection initiated by "client" first
reaches HAProxy running on HostA via the VIP (Virtual IP). Then HAProxy makes a
connection to the ganesha server running on HostC.

                              +-----------------------+ 
--------                      |  eth0: 192.168.12.10  |
|client| -------------------->|  eth1: 10.2.2.110     |-------\
--------     srcIP=xxx        |                       |       |
             dstIP=VIP        | VIP, HAProxy          |       |
                              +-----------------------+       |
                                    HostA                     |
                                                              |
                              +-----------------------+       |
                              |  eth0: 192.168.12.11  |       |
                              |  eth1: 10.2.2.111     |       |
                              |                       |       |
                              | HAProxy               |       |
                              +-----------------------+       |
                                    HostB                     |srcIP=10.2.2.110
                                                              |dstIP=10.2.2.112
                              +-----------------------+       | 
                              |  eth0: 192.168.12.12  |       |
                              |  eth1: 10.2.2.112     |<------/
                              |                       |
                              | HAProxy, NFSGanesha   |
                              +-----------------------+
                                   HostC

The srcIP for the connection made by HAProxy on HostA to NFSGanesha on HostB
is '10.2.2.110' this is the IP of the client. '10.2.2.110' is not in the list of clients
allowed to use HAProxy protocol. Therefore the connection is rejected.

The issue arises because the IPs are not all on the same subnet, so even though we configured
the "same list" of IPs in the ganesha and haproxy configuration, the source address ends up
being different from the allowed source address for that host in the ganesha
allowed-proxy-protocol hosts list.

We do not have a simple and obvious solution at this time. Currently we have the following ideas:
- add all known ip addresses for all hosts into the list
- require all the addresses to be on the same subnet, if not then the configuration will fail (and raise a health warning)

Actions

Copy link

Updated by John Mulligan 10 months ago

Pull request ID set to 52410

Actions

Copy link

Updated by John Mulligan 10 months ago

Status changed from New to Fix Under Review

Actions

Copy link

Updated by Goutham Pacha Ravi 10 months ago

Thank you; you folks would definitely know the internals better to design this; but i had an opinion to offer:

- add all known ip addresses for all hosts into the list

- require all the addresses to be on the same subnet, if not then the configuration will fail (and raise a health warning)

This would be hard to do imo; multiple network subnets could exist on a host for other reasons; and NFS traffic may be consumed from any of them.

I had a further question, what if an interface has multiple IPs configured? isn't this then an n*m set of IPs we need to collate and set as "HAProxy_Hosts"?

Actions

Copy link