Project

General

Profile

Actions

Bug #49938

closed

daemons bind to loopback iface

Added by Dan van der Ster about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus, nautilus
Regression:
Yes
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

There seems to be a regression in 14.2.18 whereby in some envs OSDs will bind to 127.0.0.1.

E.g. https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3Z5J7MYZIPM3ZUTNU4LTWADXOSZVK27R/

This was probably introduced in https://github.com/ceph/ceph/commit/89321762ad4cfdd1a68cae467181bdd1a501f14d

I don't think ifa_name contains a colon.. on my machine I tested the example code at https://man7.org/linux/man-pages/man3/getifaddrs.3.html and it outputs just `lo`

# ./a.out
lo       AF_PACKET (17)
                tx_packets = 1683333517; rx_packets = 1683333517
                tx_bytes   = 1685898949; rx_bytes   = 1685898949
eno1     AF_PACKET (17)
                tx_packets =          0; rx_packets =          0
                tx_bytes   =          0; rx_bytes   =          0
ens785f0 AF_PACKET (17)
                tx_packets = 3787675362; rx_packets = 4154015233
                tx_bytes   = 3146993958; rx_bytes   = 1004572644
ens785f1 AF_PACKET (17)
                tx_packets =          0; rx_packets =          0
                tx_bytes   =          0; rx_bytes   =          0
eno2     AF_PACKET (17)
                tx_packets =          0; rx_packets =          0
                tx_bytes   =          0; rx_bytes   =          0
lo       AF_INET (2)
                address: <127.0.0.1>
ens785f0 AF_INET (2)
                address: <10.116.6.8>
lo       AF_INET6 (10)
                address: <::1>
ens785f0 AF_INET6 (10)
                address: <fd01:1458:e00:1e::100:5>
ens785f0 AF_INET6 (10)
                address: <fe80::bdbd:76be:63fd:a4c2%ens785f0>

So we need to also explicitly skip when the iface name is exactly 'lo'.

Marking this with critical because it can take down entire clusters if operators yum update.


Related issues 6 (1 open5 closed)

Related to RADOS - Bug #50012: Ceph-osd refuses to bind on an IP on the local loopback lo (again)Fix Under ReviewKefu Chai

Actions
Related to Ceph - Bug #43417: Since the local loopback address is set to a virtual IP,OSD can't restart .Resolved

Actions
Related to Ceph - Bug #48893: Ceph-osd refuses to bind on an IP on the local loopback loResolved

Actions
Copied to Ceph - Backport #49995: octopus: daemons bind to loopback ifaceResolvedKonstantin ShalyginActions
Copied to Ceph - Backport #49996: nautilus: daemons bind to loopback ifaceResolvedKonstantin ShalyginActions
Copied to Ceph - Backport #49997: pacific: daemons bind to loopback ifaceResolvedKonstantin ShalyginActions
Actions #1

Updated by Dan van der Ster about 3 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Dan van der Ster
  • Pull request ID set to 40334
Actions #2

Updated by Dan van der Ster about 3 years ago

I suppose this will re-break the use-case described in #48893.

I would argue that OOTB, ceph should do the right thing on the most common deployments. But if we want to support this bgp-to-the-host use-case ootb also, the heuristic to pick addrs needs to be improved further.

Actions #4

Updated by Stefan Kooman about 3 years ago

I agree with Dan that a 14.2.19 should be release ASAP to fix this issue. Otherwise this will impact many more clusters I'm afraid.

Actions #5

Updated by Neha Ojha about 3 years ago

  • Backport set to pacific, octopus, nautilus
Actions #6

Updated by Kefu Chai about 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Backport Bot about 3 years ago

Actions #8

Updated by Backport Bot about 3 years ago

  • Copied to Backport #49996: nautilus: daemons bind to loopback iface added
Actions #9

Updated by Backport Bot about 3 years ago

Actions #10

Updated by Kefu Chai about 3 years ago

  • Related to Bug #50012: Ceph-osd refuses to bind on an IP on the local loopback lo (again) added
Actions #11

Updated by Nathan Cutler about 3 years ago

  • Related to Bug #43417: Since the local loopback address is set to a virtual IP,OSD can't restart . added
Actions #12

Updated by Nathan Cutler about 3 years ago

  • Related to Bug #48893: Ceph-osd refuses to bind on an IP on the local loopback lo added
Actions #13

Updated by Loïc Dachary about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF