Bug #50688: Ceph can't be deployed using cephadm on nodes with /32 ip addresses - RADOS - Ceph

Actions

Copy link

Bug #50688

closed

Ceph can't be deployed using cephadm on nodes with /32 ip addresses

Added by Francesco Pantano about 3 years ago. Updated almost 3 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Preamble

In certain data centers it is common to assign a /32 ip address to a node and let bgp handle the reachability of each node within the fabric.
This, while adding some complexity, has a number of benefits around network tolerance and scalability.

Problem

It is currently not possible to use ceph nor cephadm in such an environment due to a number of assumptions in the code. We've hit two in this situation:

A) Cephadm will fail with

Cluster fsid: 4b5c8c0a-ff60-454b-a1b4-9747aa737d19
Verifying IP 172.30.1.1 port 3300 ...
Verifying IP 172.30.1.1 port 6789 ...
ERROR: Failed to infer CIDR network for mon ip 172.30.1.1; pass --skip-mon-network to configure it later

The full log is tracked here [1].

This fails because it tries to parse the output of 'ip route ls' [2] and does not know that in a bgp environment that output is more complex:

[root@ctrl-1-0 sbin]# ip r
default proto bgp src 172.30.1.1 metric 20
        nexthop via 100.64.0.1 dev enp3s0 weight 1
        nexthop via 100.65.1.1 dev enp2s0 weight 1
100.64.0.0/30 dev enp3s0 proto kernel scope link src 100.64.0.2
100.65.1.0/30 dev enp2s0 proto kernel scope link src 100.65.1.2
192.168.1.0/24 dev enp1s0 proto kernel scope link src 192.168.1.164
192.168.2.0/24 via 192.168.1.1 dev enp1s0
192.168.3.0/24 via 192.168.1.1 dev enp1s0
192.168.4.0/24 via 192.168.1.1 dev enp1s0

B) Ceph-mgr will fail starting:

May 06 10:33:31 ctrl-1-0.bgp.ftw systemd[1]: Started Ceph mgr.ctrl-1-0.bgp.ftw.hpwvwv for 4b5c8c0a-ff60-454b-a1b4-9747aa737d19.
May 06 10:33:31 ctrl-1-0.bgp.ftw conmon[74961]: debug 2021-05-06T10:33:31.388+0000 7f9e29044500  0 set uid:gid to 167:167 (ceph:ceph)
May 06 10:33:31 ctrl-1-0.bgp.ftw conmon[74961]: debug 2021-05-06T10:33:31.388+0000 7f9e29044500  0 ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable), process ceph-mgr, pid 7
May 06 10:33:31 ctrl-1-0.bgp.ftw conmon[74961]: debug 2021-05-06T10:33:31.389+0000 7f9e29044500 -1 **unable to find any IP address in networks '172.31.0.1/32,172.31.0.1/32' interfaces ''**
May 06 10:33:31 ctrl-1-0.bgp.ftw systemd[1]: libpod-1c9f60cdb0bba3d228d24bbc8dbdd4be6091c9dca481577cf233376972f7963d.scope: Succeeded.
May 06 10:33:31 ctrl-1-0.bgp.ftw systemd[1]: libpod-1c9f60cdb0bba3d228d24bbc8dbdd4be6091c9dca481577cf233376972f7963d.scope: Consumed 64ms CPU time

This is because [3] and [4] do not cater to environments where the ip is a /32 and the routes are then just propagated via bgp.

[1] https://bugs.launchpad.net/tripleo/+bug/1927097
[2] https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L4638-L4661
[3] https://github.com/ceph/ceph/blob/master/src/common/pick_address.cc#L210
[4] https://github.com/ceph/ceph/blob/master/src/common/pick_address.cc#L147

Related issues 2 (0 open — 2 closed)