Bug #49938
closeddaemons bind to loopback iface
0%
Description
There seems to be a regression in 14.2.18 whereby in some envs OSDs will bind to 127.0.0.1.
E.g. https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3Z5J7MYZIPM3ZUTNU4LTWADXOSZVK27R/
This was probably introduced in https://github.com/ceph/ceph/commit/89321762ad4cfdd1a68cae467181bdd1a501f14d
I don't think ifa_name contains a colon.. on my machine I tested the example code at https://man7.org/linux/man-pages/man3/getifaddrs.3.html and it outputs just `lo`
# ./a.out lo AF_PACKET (17) tx_packets = 1683333517; rx_packets = 1683333517 tx_bytes = 1685898949; rx_bytes = 1685898949 eno1 AF_PACKET (17) tx_packets = 0; rx_packets = 0 tx_bytes = 0; rx_bytes = 0 ens785f0 AF_PACKET (17) tx_packets = 3787675362; rx_packets = 4154015233 tx_bytes = 3146993958; rx_bytes = 1004572644 ens785f1 AF_PACKET (17) tx_packets = 0; rx_packets = 0 tx_bytes = 0; rx_bytes = 0 eno2 AF_PACKET (17) tx_packets = 0; rx_packets = 0 tx_bytes = 0; rx_bytes = 0 lo AF_INET (2) address: <127.0.0.1> ens785f0 AF_INET (2) address: <10.116.6.8> lo AF_INET6 (10) address: <::1> ens785f0 AF_INET6 (10) address: <fd01:1458:e00:1e::100:5> ens785f0 AF_INET6 (10) address: <fe80::bdbd:76be:63fd:a4c2%ens785f0>
So we need to also explicitly skip when the iface name is exactly 'lo'.
Marking this with critical because it can take down entire clusters if operators yum update.
Updated by Dan van der Ster about 3 years ago
- Status changed from New to Fix Under Review
- Assignee set to Dan van der Ster
- Pull request ID set to 40334
Updated by Dan van der Ster about 3 years ago
I suppose this will re-break the use-case described in #48893.
I would argue that OOTB, ceph should do the right thing on the most common deployments. But if we want to support this bgp-to-the-host use-case ootb also, the heuristic to pick addrs needs to be improved further.
Updated by Dan van der Ster about 3 years ago
All daemons are impacted by this, not just OSDs: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7IAGFUXMRZU77M4KYS5NW5MZ6YJ7YN4G/
Updated by Stefan Kooman about 3 years ago
I agree with Dan that a 14.2.19 should be release ASAP to fix this issue. Otherwise this will impact many more clusters I'm afraid.
Updated by Neha Ojha about 3 years ago
- Backport set to pacific, octopus, nautilus
Updated by Kefu Chai about 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot about 3 years ago
- Copied to Backport #49995: octopus: daemons bind to loopback iface added
Updated by Backport Bot about 3 years ago
- Copied to Backport #49996: nautilus: daemons bind to loopback iface added
Updated by Backport Bot about 3 years ago
- Copied to Backport #49997: pacific: daemons bind to loopback iface added
Updated by Kefu Chai about 3 years ago
- Related to Bug #50012: Ceph-osd refuses to bind on an IP on the local loopback lo (again) added
Updated by Nathan Cutler about 3 years ago
- Related to Bug #43417: Since the local loopback address is set to a virtual IP,OSD can't restart . added
Updated by Nathan Cutler about 3 years ago
- Related to Bug #48893: Ceph-osd refuses to bind on an IP on the local loopback lo added
Updated by Loïc Dachary about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".