Project

General

Profile

Bug #43417

Since the local loopback address is set to a virtual IP,OSD can't restart .

Added by David Lee 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
common
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

I set a local loopback ip on the same network segment as the cluster, like lo:0.The network configuration is as follows:

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.211.42 netmask 255.255.255.0 broadcast 192.168.211.255
inet6 fe80::ea61:1fff:fe16:e7b7 prefixlen 64 scopeid 0x20<link>
ether e8:61:1f:16:e7:b7 txqueuelen 1000 (Ethernet)
RX packets 2845662864 bytes 2454056684356 (2.2 TiB)
RX errors 0 dropped 8338 overruns 0 frame 0
TX packets 2573756982 bytes 2075412304729 (1.8 TiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xc5800000-c5fffff
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1 (Local Loopback)
RX packets 61609662 bytes 155594418449 (144.9 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 61609662 bytes 155594418449 (144.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo:0: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 192.168.211.200 netmask 255.255.255.255
loop txqueuelen 1 (Local Loopback)
The ceph configure is as follows:
[global]
mon_initial_members = 172e18e211e**, 172e18e211e**,172e18e211e**
mon_host = 192.168.211.***,192.168.211.***,192.168.211.***
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 192.168.211.0/24

The the problem is :
[root@172e18e211e42 ~]# systemctl status ceph-osd@8
- Ceph object storage daemon osd.8
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2019-12-24 09:33:42 CST; 12s ago
Process: 1553818 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 1553823 (ceph-osd)
CGroup: /system.slice/system-ceph\
└─1553823 /usr/bin/ceph-osd -f --cluster ceph --id 8 --setuser ceph --setgroup ceph

Dec 24 09:33:42 172e18e211e42 systemd1: Starting Ceph object storage daemon osd.8...
Dec 24 09:33:42 172e18e211e42 systemd1: Started Ceph object storage daemon osd.8.
Dec 24 09:33:42 172e18e211e42 ceph-osd1553823: 2019-12-24 09:33:42.981 7f6304c02d80 -1 Falling back to public interface
Dec 24 09:33:45 172e18e211e42 ceph-osd1553823: 2019-12-24 09:33:45.820 7f6304c02d80 -1 osd.8 9095 log_to_monitors {default=true}
Dec 24 09:33:45 172e18e211e42 ceph-osd1553823: 2019-12-24 09:33:45.849 7f62f722a700 -1 osd.8 9095 set_numa_affinity unable to identify public interface 'lo:0' numa node: (2) No such file or directory
Dec 24 09:33:51 172e18e211e42 ceph-osd1553823: 2019-12-24 09:33:51.758 7f62f722a700 -1 osd.8 9131 set_numa_affinity unable to identify public interface 'lo:0' numa node: (2) No such file or directory

The the osd can't restart . And I pinpoint the problem at /src/common/ipaddr.cc ,
const struct ifaddrs *find_ipv4_in_subnet(const struct ifaddrs *addrs,
const struct sockaddr_in *net,
unsigned int prefix_len,
int numa_node)
can't get the right ip.

History

#1 Updated by David Lee 3 months ago

PR:https://github.com/ceph/ceph/pull/32420

#2 Updated by Kefu Chai 3 months ago

  • Status changed from New to Resolved
  • Pull request ID set to 32420

Also available in: Atom PDF