Project

General

Profile

Bug #24676

FreeBSD/Linux integration - monitor map with wrong sa_family

Added by Alexander Haemmerle about 1 year ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
06/27/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:

Description

We are using a ceph cluster in a mixed FreeBSD/Linux environment. The ceph cluster is based on FreeBSD. Linux clients cannot connect to the cluster. For example ceph -s results in:

NetHandler create_socket couldn't create socket (97) Address family not supported by protocol

Printing the monmap on a Linux client shows:

monmaptool: monmap file monmap_freebsd
epoch 0
fsid 7e15ef20-73c2-11e8-98ed-95470d800f64
last_changed 2018-06-19 18:21:16.099386
created 2018-06-19 18:21:16.099386
0: :/0 mon.freebsd03
1: :/0 mon.freebsd01
2: :/0 mon.freebsd02

Printing the same map on FreeBSD:

epoch 0
fsid 7e15ef20-73c2-11e8-98ed-95470d800f64
last_changed 2018-06-19 18:21:16.099386
created 2018-06-19 18:21:16.099386
0: 10.135.28.158:6789/0 mon.freebsd03
1: 10.135.69.231:6789/0 mon.freebsd01
2: 10.135.93.250:6789/0 mon.freebsd02

I used gdb on monmaptool on Linux to show me the mon_info map. It shows 512 for the sa_family.

$1 = std::map with 3 elements = {["freebsd01"] = {name = "freebsd01", public_addr = {static TYPE_DEFAULT = entity_addr_t::TYPE_LEGACY, type = 1, nonce = 0, u = {sa = {sa_family = 512,
sa_data = "\032\205\n\207E\347\000\000\000\000\000\000\000"}, sin = {sin_family = 512, sin_port = 34074, sin_addr = {s_addr = 3880093450}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {
sin6_family = 512, sin6_port = 34074, sin6_flowinfo = 3880093450, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, _u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}},
sin6_scope_id = 0}}}, priority = 0}, ["freebsd02"] = {name = "freebsd02", public_addr = {static TYPE_DEFAULT = entity_addr_t::TYPE_LEGACY, type = 1, nonce = 0, u = {sa = {sa_family = 512,
sa_data = "\032\205\n\207]\372\000\000\000\000\000\000\000"}, sin = {sin_family = 512, sin_port = 34074, sin_addr = {s_addr = 4200433418}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {
sin6_family = 512, sin6_port = 34074, sin6_flowinfo = 4200433418, sin6_addr = {
_in6_u = {__u6_addr8 = '\000' <repeats 15 times>, _u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}},
sin6_scope_id = 0}}}, priority = 0}, ["freebsd03"] = {name = "freebsd03", public_addr = {static TYPE_DEFAULT = entity_addr_t::TYPE_LEGACY, type = 1, nonce = 0, u = {sa = {sa_family = 512,
sa_data = "\032\205\n\207\034\236\000\000\000\000\000\000\000"}, sin = {sin_family = 512, sin_port = 34074, sin_addr = {s_addr = 2652669706}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {
sin6_family = 512, sin6_port = 34074, sin6_flowinfo = 2652669706, sin6_addr = {
_in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}},
sin6_scope_id = 0}}}, priority = 0}}

On a working monmap sa_family is equal 2.

I setup a ceph test cluster with Linux and FreeBSD as clients and the problem is symmetrical showing

NetHandler create_socket couldn't create socket (47) Address family not supported by protocol family

Could this be an encoding/decoding problem?

Versions used:

FreeBSD 11.1: ceph version 12.2.4 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
Debian 9.4: ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
Ubuntu 16.04: ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)


Related issues

Copied to RADOS - Backport #37972: luminous: FreeBSD/Linux integration - monitor map with wrong sa_family Resolved

History

#1 Updated by Alexander Haemmerle about 1 year ago

I investigated further with gdb. Lines 478-501 from msg/msg_types.h seem to be the culprit. Here sa_family is decoded wrong when using client/server from different OS:

#if defined(FreeBSD) || defined(APPLE)
u.sa.sa_len = 0;
__le16 ss_family;
if (elen < sizeof(ss_family)) {
throw buffer::malformed_input("elen smaller than family len");
}
decode(ss_family, bl);
u.sa.sa_family = ss_family;
elen -= sizeof(ss_family);
if (elen > get_sockaddr_len() - sizeof(u.sa.sa_family)) {
throw buffer::malformed_input("elen exceeds sockaddr len");
}
bl.copy(elen, u.sa.sa_data);
#else
if (elen < sizeof(u.sa.sa_family)) {
throw buffer::malformed_input("elen smaller than family len");
}
bl.copy(sizeof(u.sa.sa_family), (char*)&u.sa.sa_family);
if (elen > get_sockaddr_len()) {
throw buffer::malformed_input("elen exceeds sockaddr len");
}
elen -= sizeof(u.sa.sa_family);
bl.copy(elen, u.sa.sa_data);
#endif

Setting sa_family to 2 with gdb on runtime gives a correct decoding for mon_info on Linux using a monmap created on FreeBSD.

#2 Updated by Alexander Haemmerle about 1 year ago

I discovered that commit 9099ca5 - "fix the dencoder of entity_addr_t" introduced this kind of interoperability which is tagged for at least v13.0.1. I backported the commit to 12.2.4 and now it seems to be working. Issue can be closed.

#3 Updated by Patrick Donnelly about 1 year ago

  • Project changed from Ceph to RADOS
  • Component(RADOS) Monitor added

#4 Updated by Josh Durgin about 1 year ago

  • Status changed from New to Resolved

#5 Updated by Richard Gallamore 8 months ago

Hello,

Just tested this and received the same "NetHandler create_socket couldn't create socket (97) Address family not supported by protocol" error this bug fixed. Was there a regression somewhere between versions?

Linux version: ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
FreeBSD version: ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)

#6 Updated by Kefu Chai 6 months ago

  • Status changed from Resolved to Pending Backport
  • Backport set to luminous

Richard, i don't think 9099ca5 was ever backported to luminous. if you want to get it fixed sooner in luminous. probably you could help backport https://github.com/ceph/ceph/pull/17615/commits/9099ca599de5238cde917f1e1f933247392de03e .

#7 Updated by Mykola Golub 6 months ago

  • Copied to Backport #37972: luminous: FreeBSD/Linux integration - monitor map with wrong sa_family added

#8 Updated by Nathan Cutler 5 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF