Bug #49938: daemons bind to loopback iface - Ceph - Ceph

Actions

Copy link

Bug #49938

closed

daemons bind to loopback iface

Added by Dan van der Ster about 3 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Dan van der Ster

Category:

Target version:

% Done:

Source:

Tags:

Backport:

pacific, octopus, nautilus

Regression:

Yes

Severity:

1 - critical

Reviewed:

Affected Versions:

v14.2.18

ceph-qa-suite:

Pull request ID:

40334

Crash signature (v1):

Crash signature (v2):

Description

There seems to be a regression in 14.2.18 whereby in some envs OSDs will bind to 127.0.0.1.

E.g. https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3Z5J7MYZIPM3ZUTNU4LTWADXOSZVK27R/

This was probably introduced in https://github.com/ceph/ceph/commit/89321762ad4cfdd1a68cae467181bdd1a501f14d

I don't think ifa_name contains a colon.. on my machine I tested the example code at https://man7.org/linux/man-pages/man3/getifaddrs.3.html and it outputs just `lo`

# ./a.out
lo       AF_PACKET (17)
                tx_packets = 1683333517; rx_packets = 1683333517
                tx_bytes   = 1685898949; rx_bytes   = 1685898949
eno1     AF_PACKET (17)
                tx_packets =          0; rx_packets =          0
                tx_bytes   =          0; rx_bytes   =          0
ens785f0 AF_PACKET (17)
                tx_packets = 3787675362; rx_packets = 4154015233
                tx_bytes   = 3146993958; rx_bytes   = 1004572644
ens785f1 AF_PACKET (17)
                tx_packets =          0; rx_packets =          0
                tx_bytes   =          0; rx_bytes   =          0
eno2     AF_PACKET (17)
                tx_packets =          0; rx_packets =          0
                tx_bytes   =          0; rx_bytes   =          0
lo       AF_INET (2)
                address: <127.0.0.1>
ens785f0 AF_INET (2)
                address: <10.116.6.8>
lo       AF_INET6 (10)
                address: <::1>
ens785f0 AF_INET6 (10)
                address: <fd01:1458:e00:1e::100:5>
ens785f0 AF_INET6 (10)
                address: <fe80::bdbd:76be:63fd:a4c2%ens785f0>

So we need to also explicitly skip when the iface name is exactly 'lo'.

Marking this with critical because it can take down entire clusters if operators yum update.

Related issues 6 (1 open — 5 closed)

Actions

Copy link

Updated by Dan van der Ster about 3 years ago

Status changed from New to Fix Under Review
Assignee set to Dan van der Ster
Pull request ID set to 40334

Actions

Copy link

Updated by Dan van der Ster about 3 years ago

I suppose this will re-break the use-case described in #48893.

I would argue that OOTB, ceph should do the right thing on the most common deployments. But if we want to support this bgp-to-the-host use-case ootb also, the heuristic to pick addrs needs to be improved further.

Actions

Copy link

Updated by Dan van der Ster about 3 years ago

All daemons are impacted by this, not just OSDs: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7IAGFUXMRZU77M4KYS5NW5MZ6YJ7YN4G/

Actions

Copy link

Updated by Stefan Kooman about 3 years ago

I agree with Dan that a 14.2.19 should be release ASAP to fix this issue. Otherwise this will impact many more clusters I'm afraid.

Actions

Copy link

Updated by Neha Ojha about 3 years ago

Backport set to pacific, octopus, nautilus

Actions

Copy link

Updated by Kefu Chai about 3 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Backport Bot about 3 years ago

Copied to Backport #49995: octopus: daemons bind to loopback iface added

Actions

Copy link

Updated by Backport Bot about 3 years ago

Copied to Backport #49996: nautilus: daemons bind to loopback iface added

Actions

Copy link

Updated by Backport Bot about 3 years ago

Copied to Backport #49997: pacific: daemons bind to loopback iface added

Actions

Copy link

#10

Updated by Kefu Chai about 3 years ago

Related to Bug #50012: Ceph-osd refuses to bind on an IP on the local loopback lo (again) added

Actions

Copy link

#11

Updated by Nathan Cutler about 3 years ago

Related to Bug #43417: Since the local loopback address is set to a virtual IP,OSD can't restart . added

Actions

Copy link

#12

Updated by Nathan Cutler about 3 years ago

Related to Bug #48893: Ceph-osd refuses to bind on an IP on the local loopback lo added

Actions

Copy link

#13

Updated by Loïc Dachary about 3 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #49938

daemons bind to loopback iface

Updated by Dan van der Ster about 3 years ago

Updated by Dan van der Ster about 3 years ago

Updated by Dan van der Ster about 3 years ago

Updated by Stefan Kooman about 3 years ago

Updated by Neha Ojha about 3 years ago

Updated by Kefu Chai about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Kefu Chai about 3 years ago

Updated by Nathan Cutler about 3 years ago

Updated by Nathan Cutler about 3 years ago

Updated by Loïc Dachary about 3 years ago