Project

General

Profile

Actions

Bug #51257

closed

mgr/cephadm: Cannot add managed (ceph apply) mon daemons on different subnets

Added by Aggelos Avgerinos almost 3 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
orchestrator
Target version:
-
% Done:

0%

Source:
Tags:
cephadm
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In our network setup we have an IP (layer3) Fabric to the server using /128 IPv6 addresses3 and BGP to the server, in which case there is no notion of a layer2 domain in our infrastructure.

After bootstraping a cluster we tried to add mon daemons with $ ceph orch apply mon label:mon only to get the following message4 in mgr daemon logs:


Jun 14 18:18:28 ceph101 conmon[87926]: debug 2021-06-14T15:18:28.755+0000 7fd61fa18700  0 log_channel(cephadm) log [INF] : Filtered out host ceph101: could not verify host allowed virtual ips
Jun 14 18:18:28 ceph101 conmon[87926]: debug 2021-06-14T15:18:28.755+0000 7fd61fa18700  0 log_channel(cephadm) log [INF] : Filtered out host ceph102: could not verify host allowed virtual ips

We took a look in cephadm's code a little bit:

cephadm manager module performs a check5, only when the deployed service is a MON daemon, to check if the network matches the public_network.
This check calls the matches_network function6 which is the place where things break for our setup.

Taking a closer look at the matches_network function6 we can see that:

def matches_network(host):
   # type: (str) -> bool
   if not public_network:
       return False
   # make sure we have 1 or more IPs for that network on that
   # host
   return len(self.mgr.cache.networks[host].get(public_network, [])) > 0

1) It will always return False if the public_network is unset.
2) It searches a cache7 inside the manager daemon to find at least 1 IP address on the defined public_network and fails if it doesn't

Though, even when we tried adding each separate /128 prefix for each node to the public_network variable, we still couldn't get the mon daemons to spin up with the same message in the logs.

We took a deeper look in the code to find out why while we have a matching public_network this still would not work:

The aforementioned cache7 fetches the networks_and_interfaces key for the affected host from the KV store8.

For our hosts we can see that the desired addresses are not correctly matched and stored in the KV store.
Let's take ceph101 for example:


root@ceph101:/# ip -6 a show dev ipfabric0
8: ipfabric0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    inet6 fd40:abcd::cef:101/128 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::18fa:ceff:fef8:7502/64 scope link
       valid_lft forever preferred_lft forever

root@ceph101:/# ceph config-key get mgr/cephadm/host.ceph101 | jq '.networks_and_interfaces'
{
 "fe80::/64": {
    "ens1f0": [
      "fe80::ae1f:6bff:fef8:de4e" 
    ],
    "ens1f1": [
      "fe80::ae1f:6bff:fef8:de4f" 
    ],
    "ipfabric0": [
      "fe80::18fa:ceff:fef8:7502" 
    ]
  }
}

As seen above, the ipfabric0 interface has a valid global IPv6 address fd42:abcd::cef:101/128 which is never detected by the tool that fills the cache,
Instead the matching address is a link-local9 one, which is irrelevant.

In order to find why our /128 IPv6 addresses got rejected by this tool we checked the specific function10 responsible for filling this cache. This function under the hood calls _parse_ipv6_route passing all routes and IP addresses it found in the system with ip -6 route ls and ip -6 addr ls respectively.

The routes passed to it are used to decide which networks is this host connected to and the snippet below rejects all routes without a subnet mask.

        if '/' not in net:  # only consider networks with a mask
            continue

This lead the orchestrator to reject all of our routes:


root@ceph101:/# ip -6 ro ls | grep '::cef'
fd42:abcd::cef:102 proto bird src fd42:abcd::cef:101 metric 512
fd42:abcd::cef:103 proto bird src fd42:abcd::cef:101 metric 512
fd42:abcd::cef:104 proto bird src fd42:abcd::cef:101 metric 512
fd42:abcd::cef:105 proto bird src fd42:abcd::cef:101 metric 512

To sum up:

cephadm assumes that all mon daemons are on the same layer2 domain1, namely the public_network[2].
This assumption makes some network setups, like our own, to be unable to deploy mon daemons on different subnets.
Frankly, I can't see why cephadm should care about the underlying network topology when it comes to mon daemons.
The PR1 that introduced makes it seem like some kind of a safeguard to avoid deploying mon daemons in unwanted hosts, but I believe this is the responsibility of the placement spec11.

It would be nice to either change the logic to not rely on the system routes or even add a flag like mgr/cephadm/skip-mon-network-checks to bypass this restricted behavior.

[1]: https://github.com/ceph/ceph/pull/33952/
[2]: https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/#confval-public_network
[3]: https://datatracker.ietf.org/doc/html/rfc8273
[4]: https://github.com/ceph/ceph/blob/v16.2.4/src/pybind/mgr/cephadm/schedule.py#L307
[5]: https://github.com/ceph/ceph/blob/v16.2.4/src/pybind/mgr/cephadm/serve.py#L553
[6]: https://github.com/ceph/ceph/blob/v16.2.4/src/pybind/mgr/cephadm/serve.py#L539
[7]: https://github.com/ceph/ceph/blob/v16.2.4/src/pybind/mgr/cephadm/inventory.py#L225
[8]: https://github.com/ceph/ceph/blob/v16.2.4/src/pybind/mgr/cephadm/inventory.py#L312
[9]: https://en.wikipedia.org/wiki/Link-local_address
[10]: https://github.com/ceph/ceph/blob/v16.2.4/src/cephadm/cephadm#L4602
[11]: https://docs.ceph.com/en/latest/cephadm/service-management/#placement-specification


Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #53496: cephadm: list-networks swallows /128 networks, breaking the orchestrator ("Filtered out host mon1: does not belong to mon public_network")Resolved

Actions
Actions

Also available in: Atom PDF