Actions
Bug #43413
openVirtual IP address of iface lo results in failing to start an OSD
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
12/24/2019
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We added a virtual IP on the loopback internetface lo to complete the LVS configuration.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 192.168.122.66/32 brd 192.168.122.66 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
The virtaul IP is in the same segment with cluster IP.
[global]
fsid = 8404c08f-97a1-4cab-93ba-bc83139c11ae
mon_initial_members = node1
public_network = 192.168.122.0/24
It all went well when the ceph worked normally.However,the OSD.2 went down and we were not abel to restart it.
ceph-osd.2.log:
2019-12-20 18:39:39.360 7f44d5eb0700 10 osd.2 706 _send_boot
2019-12-20 18:39:39.360 7f44d5eb0700 20 osd.2 706 initial client_addrs [v2:192.168.122.66:6800/3258,v1:192.168.122.66:6801/3258], cluster_addrs [v2:192.168.122.66:6802/3258,v1:192.168.122.66:6803/3258], hb_back_addrs [v2:192.168.122.66:6806/3258,v1:192.168.122.66:6807/3258], hb_front_addrs [v2:192.168.122.66:6804/3258,v1:192.168.122.66:6805/3258]
2019-12-20 18:39:39.360 7f44d5eb0700 10 osd.2 706 new session (outgoing) 0x5639c6693800 con=0x5639c5b81800 addr=v2:192.168.122.66:6802/3258
2019-12-20 18:39:39.360 7f44d5eb0700 -1 osd.2 706 set_numa_affinity unable to identify public interface 'lo:0' numa node: (2) No such file or directory
2019-12-20 18:39:39.360 7f44d5eb0700 1 osd.2 706 set_numa_affinity not setting numa affinity
2019-12-20 18:39:39.360 7f44d5eb0700 10 osd.2 706 final client_addrs [v2:192.168.122.66:6800/3258,v1:192.168.122.66:6801/3258], cluster_addrs [v2:192.168.122.66:6802/3258,v1:192.168.122.66:6803/3258], hb_back_addrs [v2:192.168.122.66:6806/3258,v1:192.168.122.66:6807/3258], hb_front_addrs [v2:192.168.122.66:6804/3258,v1:192.168.122.66:6805/3258]
2019-12-20 18:39:39.361 7f44cda2f700 10 osd.2 706 ms_handle_connect con 0x5639c6662000
2019-12-20 18:39:39.361 7f44cda2f700 10 osd.2 706 setting 0 queries
2019-12-20 18:39:39.361 7f44cda2f700 20 osd.2 706 reports for 0 queries
2019-12-20 18:39:39.527 7f44d5eb0700 10 osd.2 706 _collect_metadata no unique device id for vdb: fallback method has serial ''but no model
2019-12-20 18:39:39.527 7f44d5eb0700 10 osd.2 706 _collect_metadata {arch=x86_64,back_addr=[v2:192.168.122.66:6802/3258,v1:192.168.122.66:6803/3258],back_iface=lo:0,bluefs=1,bluefs_single_shared_device=1,bluestore_bdev_access_mode=blk,bluestore_bdev_block_size=4096,bluestore_bdev_dev_node=/dev/dm-4,bluestore_bdev_driver=KernelDevice,bluestore_bdev_partition_path=/dev/dm-4,bluestore_bdev_rotational=1,bluestore_bdev_size=9663676416,bluestore_bdev_support_discard=0,bluestore_bdev_type=hdd,ceph_release=nautilus,ceph_version=ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable),ceph_version_short=14.2.4,cpu=Intel Core Processor (Broadwell, no TSX, IBRS),default_device_class=hdd,device_ids=,devices=vdb,distro=centos,distro_description=CentOS Linux 7 (Core),distro_version=7,front_addr=[v2:192.168.122.66:6800/3258,v1:192.168.122.66:6801/3258],front_iface=lo:0,hb_back_addr=[v2:192.168.122.66:6806/3258,v1:192.168.122.66:6807/3258],hb_front_addr=[v2:192.168.122.66:6804/3258,v1:192.168.122.66:6805/3258],hostname=node1,journal_rotational=1,kernel_description=#1 SMP Wed Nov 13 23:58:53 UTC 2019,kernel_version=3.10.0-1062.4.3.el7.x86_64,mem_swap_kb=1048572,mem_total_kb=1014752,network_numa_unknown_ifaces=lo:0,objectstore_numa_unknown_devices=vdb,os=Linux,osd_data=/var/lib/ceph/osd/ceph-2,osd_objectstore=bluestore,rotational=1}
We thought that the OSD.2 took this virtual IP as its cluster IP, but OSD.2 can not connect to the ceph cluster via lo.
Then we read the code of OSD.cc and found the following code.
int OSD::set_numa_affinity()
{
...
// check network numa node(s)
int front_node = -1, back_node = -1;
string front_iface = pick_iface(
cct,
client_messenger->get_myaddrs().front().get_sockaddr_storage());
string back_iface = pick_iface(
cct,
cluster_messenger->get_myaddrs().front().get_sockaddr_storage());
int r = get_iface_numa_node(front_iface, &front_node);
...
const struct ifaddrs *find_ipv4_in_subnet(const struct ifaddrs *addrs,
const struct sockaddr_in *net,
unsigned int prefix_len,
int numa_node) {
struct in_addr want, temp;
netmask_ipv4(&net->sin_addr, prefix_len, &want);
for (; addrs != NULL; addrs = addrs->ifa_next) {
if (addrs->ifa_addr == NULL)
continue;
// Here the virtual IP lo:0 can not be filtered
if (strcmp(addrs->ifa_name, "lo") == 0)
continue;
Updated by lei xin over 3 years ago
I ran into the same problem and my trigger condition was the same, i.e. when configuring the VIP on the loopback internetface lo for LVS.But strangely, even in the same node, this is the only one ossd that does this.
Updated by lei xin over 3 years ago
lei xin wrote:
I ran into the same problem and my trigger condition was the same, i.e. when configuring the VIP on the loopback internetface lo for LVS.But strangely, even in the same node, this is the only one ossd that does this.
Sorry, "ossd" --> "OSD"
Actions