Bug #43417
Updated by Kefu Chai almost 3 years ago
I set a local loopback ip on the same network segment as the cluster, like lo:0.The network configuration is as follows: <pre> eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.211.42 netmask 255.255.255.0 broadcast 192.168.211.255 inet6 fe80::ea61:1fff:fe16:e7b7 prefixlen 64 scopeid 0x20<link> ether e8:61:1f:16:e7:b7 txqueuelen 1000 (Ethernet) RX packets 2845662864 bytes 2454056684356 (2.2 TiB) RX errors 0 dropped 8338 overruns 0 frame 0 TX packets 2573756982 bytes 2075412304729 (1.8 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0xc5800000-c5fffff lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1 (Local Loopback) RX packets 61609662 bytes 155594418449 (144.9 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 61609662 bytes 155594418449 (144.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo:0: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 192.168.211.200 netmask 255.255.255.255 loop txqueuelen 1 (Local Loopback) </pre> The ceph configure is as follows: <pre> [global] mon_initial_members = 172e18e211e**, 172e18e211e**,172e18e211e** mon_host = 192.168.211.***,192.168.211.***,192.168.211.*** auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public network = 192.168.211.0/24 </pre> The the problem is : <pre> [root@172e18e211e42 ~]# systemctl status ceph-osd@8 ● ceph-osd@8.service - Ceph object storage daemon osd.8 Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2019-12-24 09:33:42 CST; 12s ago Process: 1553818 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) Main PID: 1553823 (ceph-osd) CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@8.service └─1553823 /usr/bin/ceph-osd -f --cluster ceph --id 8 --setuser ceph --setgroup ceph Dec 24 09:33:42 172e18e211e42 systemd[1]: Starting Ceph object storage daemon osd.8... Dec 24 09:33:42 172e18e211e42 systemd[1]: Started Ceph object storage daemon osd.8. Dec 24 09:33:42 172e18e211e42 ceph-osd[1553823]: 2019-12-24 09:33:42.981 7f6304c02d80 -1 Falling back to public interface Dec 24 09:33:45 172e18e211e42 ceph-osd[1553823]: 2019-12-24 09:33:45.820 7f6304c02d80 -1 osd.8 9095 log_to_monitors {default=true} Dec 24 09:33:45 172e18e211e42 ceph-osd[1553823]: 2019-12-24 09:33:45.849 7f62f722a700 -1 osd.8 9095 set_numa_affinity unable to identify public interface 'lo:0' numa node: (2) No such file or directory Dec 24 09:33:51 172e18e211e42 ceph-osd[1553823]: 2019-12-24 09:33:51.758 7f62f722a700 -1 osd.8 9131 set_numa_affinity unable to identify public interface 'lo:0' numa node: (2) No such file or directory </pre> The the osd can't restart . And I pinpoint the problem at /src/common/ipaddr.cc , <pre><code class="cpp"> const struct ifaddrs *find_ipv4_in_subnet(const struct ifaddrs *addrs, const struct sockaddr_in *net, unsigned int prefix_len, int numa_node) </code></pre> can't get the right ip.