Project

General

Profile

Bug #50688

Ceph can't be deployed using cephadm on nodes with /32 ip addresses

Added by Francesco Pantano 3 months ago. Updated about 2 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Preamble

In certain data centers it is common to assign a /32 ip address to a node and let bgp handle the reachability of each node within the fabric.
This, while adding some complexity, has a number of benefits around network tolerance and scalability.

Problem

It is currently not possible to use ceph nor cephadm in such an environment due to a number of assumptions in the code. We've hit two in this situation:

A) Cephadm will fail with

Cluster fsid: 4b5c8c0a-ff60-454b-a1b4-9747aa737d19
Verifying IP 172.30.1.1 port 3300 ...
Verifying IP 172.30.1.1 port 6789 ...
ERROR: Failed to infer CIDR network for mon ip 172.30.1.1; pass --skip-mon-network to configure it later

The full log is tracked here [1].

This fails because it tries to parse the output of 'ip route ls' [2] and does not know that in a bgp environment that output is more complex:

[root@ctrl-1-0 sbin]# ip r
default proto bgp src 172.30.1.1 metric 20
        nexthop via 100.64.0.1 dev enp3s0 weight 1
        nexthop via 100.65.1.1 dev enp2s0 weight 1
100.64.0.0/30 dev enp3s0 proto kernel scope link src 100.64.0.2
100.65.1.0/30 dev enp2s0 proto kernel scope link src 100.65.1.2
192.168.1.0/24 dev enp1s0 proto kernel scope link src 192.168.1.164
192.168.2.0/24 via 192.168.1.1 dev enp1s0
192.168.3.0/24 via 192.168.1.1 dev enp1s0
192.168.4.0/24 via 192.168.1.1 dev enp1s0

B) Ceph-mgr will fail starting:

May 06 10:33:31 ctrl-1-0.bgp.ftw systemd[1]: Started Ceph mgr.ctrl-1-0.bgp.ftw.hpwvwv for 4b5c8c0a-ff60-454b-a1b4-9747aa737d19.
May 06 10:33:31 ctrl-1-0.bgp.ftw conmon[74961]: debug 2021-05-06T10:33:31.388+0000 7f9e29044500  0 set uid:gid to 167:167 (ceph:ceph)
May 06 10:33:31 ctrl-1-0.bgp.ftw conmon[74961]: debug 2021-05-06T10:33:31.388+0000 7f9e29044500  0 ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable), process ceph-mgr, pid 7
May 06 10:33:31 ctrl-1-0.bgp.ftw conmon[74961]: debug 2021-05-06T10:33:31.389+0000 7f9e29044500 -1 **unable to find any IP address in networks '172.31.0.1/32,172.31.0.1/32' interfaces ''**
May 06 10:33:31 ctrl-1-0.bgp.ftw systemd[1]: libpod-1c9f60cdb0bba3d228d24bbc8dbdd4be6091c9dca481577cf233376972f7963d.scope: Succeeded.
May 06 10:33:31 ctrl-1-0.bgp.ftw systemd[1]: libpod-1c9f60cdb0bba3d228d24bbc8dbdd4be6091c9dca481577cf233376972f7963d.scope: Consumed 64ms CPU time

This is because [3] and [4] do not cater to environments where the ip is a /32 and the routes are then just propagated via bgp.

[1] https://bugs.launchpad.net/tripleo/+bug/1927097
[2] https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L4638-L4661
[3] https://github.com/ceph/ceph/blob/master/src/common/pick_address.cc#L210
[4] https://github.com/ceph/ceph/blob/master/src/common/pick_address.cc#L147


Related issues

Related to Ceph - Bug #48893: Ceph-osd refuses to bind on an IP on the local loopback lo Resolved
Duplicates Ceph - Backport #50598: octopus: Ceph-osd refuses to bind on an IP on the local loopback lo In Progress

History

#1 Updated by Patrick Donnelly 3 months ago

  • Project changed from CephFS to Orchestrator

#2 Updated by Sebastian Wagner 2 months ago

  • Description updated (diff)

#3 Updated by Sebastian Wagner about 2 months ago

  • Related to Bug #48893: Ceph-osd refuses to bind on an IP on the local loopback lo added

#4 Updated by Kefu Chai about 2 months ago

  • Project changed from Orchestrator to RADOS

should have been fixed by https://github.com/ceph/ceph/pull/40961

#5 Updated by Kefu Chai about 2 months ago

  • Status changed from New to Duplicate

#6 Updated by Kefu Chai about 2 months ago

  • Duplicates Backport #50598: octopus: Ceph-osd refuses to bind on an IP on the local loopback lo added

Also available in: Atom PDF