Project

General

Profile

Actions

Bug #47951

closed

MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'

Added by Wido den Hollander over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
ipv6,dns,round robin,mon_host,client
Backport:
octopus,nautilus
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
MonClient
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I performed a test upgrade to 14.2.12 today on a cluster using IPv6 with Round Robin DNS for mon_host

[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
fsid = 0d56dd8f-7ae0-4447-b51b-f8b818749307
mon_host = mon.objects.xxxx
ms_bind_ipv6 = true

Running 'ceph -s' now fails:

root@wido-standard-benchmark:~# ceph -s
unable to parse addrs in 'mon.objects.xxx.xxxx.xxxx'
[errno 22] error connecting to the cluster
root@wido-standard-benchmark:~#

The hostname is a Round Robin DNS entry pointing to IPv6 addresses:

root@wido-standard-benchmark:~# host mon.objects.ams02.cldin.net
mon.objects.xx.xx.net has IPv6 address 2a05:yy:xx:d:84b5:85ff:zzzz:33bf
mon.objects.xx.xx.net has IPv6 address 2a05:yy:xx:d:645f:97ff:zzzz:2b2a
mon.objects.xx.xx.net has IPv6 address 2a05:yy:xx:d:3416:d5ff:zzzz:18db
root@wido-standard-benchmark:~# 

I took a look with strace and I found this:

14980 socket(AF_INET6, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
14980 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2a05:xxx:xxx:d:84b5:85ff:fe40:33bf", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
14980 getsockname(3, {sa_family=AF_INET6, sin6_port=htons(52258), inet_pton(AF_INET6, "2a05:xxx:xxx:0:1c00:16ff:fe00:60", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [28]) = 0
14980 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
14980 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2a05:xxx:xxx:d:645f:97ff:fe7f:2b2a", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
14980 getsockname(3, {sa_family=AF_INET6, sin6_port=htons(52850), inet_pton(AF_INET6, "2a05:xxx:xxx:0:1c00:16ff:fe00:60", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [28]) = 0
14980 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
14980 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2a05:xxx:xxxx:d:3416:d5ff:fe92:18db", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
14980 getsockname(3, {sa_family=AF_INET6, sin6_port=htons(35119), inet_pton(AF_INET6, "2a05:xxx:702:0:1c00:16ff:fe00:60", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [28]) = 0
14980 close(3)                          = 0
14980 write(2, "unable to parse addrs in '", 26) = 26
14980 write(2, "mon.objects.xxx.xxx.net", 27) = 27
14980 write(2, "'", 1)                  = 1
14980 write(2, "\n", 1)   

It performs the DNS lookup, but then it doesn't know what to do with it it seems.

Setting this one to Urgent as it breaks existing cluster.


Related issues 3 (0 open3 closed)

Related to CephFS - Backport #47013: nautilus: librados|libcephfs: use latest MonMap when creating from CephContextResolvedShyamsundar RanganathanActions
Copied to RADOS - Backport #47986: nautilus: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'ResolvedNathan CutlerActions
Copied to RADOS - Backport #47987: octopus: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'ResolvedNathan CutlerActions
Actions #1

Updated by Jason Dillaman over 3 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (MonClient)
Actions #2

Updated by Jason Dillaman over 3 years ago

  • Component(RADOS) MonClient added
Actions #3

Updated by Patrick Donnelly over 3 years ago

  • Subject changed from nautilus: mon_host with DNS Round Robin results in 'unable to parse addrs' to MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
  • Status changed from New to In Progress
  • Assignee set to Patrick Donnelly
  • Target version set to v16.0.0
  • Source set to Community (user)
  • Backport set to octopus,nautilus
Actions #4

Updated by Patrick Donnelly over 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 37758
Actions #5

Updated by Patrick Donnelly over 3 years ago

  • Related to Backport #47013: nautilus: librados|libcephfs: use latest MonMap when creating from CephContext added
Actions #6

Updated by Kefu Chai over 3 years ago

  • Regression changed from No to Yes
Actions #8

Updated by Jonas Jelten over 3 years ago

The fix is probably:

diff --git a/src/mon/MonMap.cc b/src/mon/MonMap.cc
index 19092d5326..05c1cfff31 100644
--- a/src/mon/MonMap.cc
+++ b/src/mon/MonMap.cc
@@ -502,7 +502,7 @@ int MonMap::init_with_hosts(const std::string& hostlist,
     return -EINVAL;
   if (addrs.empty())
     return -ENOENT;
-  if (!init_with_addrs(addrs, for_mkfs, prefix)) {
+  if (init_with_addrs(addrs, for_mkfs, prefix)) {
     return -EINVAL;
   }
   calc_legacy_ranks();
Actions #9

Updated by Troy Ablan over 3 years ago

This appears to break any sort of resolution of IPv6 addresses from hostnames. This affects qemu's usage of rbd, in this case via libvirt, when there were hostnames pointing to IPv6 addresses were specified as monitors, round-robin or not. Substituting IP addresses here instead works around the problem.

      <source protocol='rbd' name='vm-pool/gcompute1.las-sda'>
        <host name='mon1.example.com' port='6789'/>
        <host name='mon2.example.com' port='6789'/>
      </source>

BTW, it's also unfortunate and disappointing that this release is still completely unmentioned on https://docs.ceph.com/en/latest/releases/nautilus/. Is this not the authoritative reference for releases?

Actions #10

Updated by Alex Litvak over 3 years ago

Will the fix it to it posted soon? I am building ceph in containers from existing releases, is there a tag I can use to either revert the commit that broke the cluster feature or a build that have a fix implemented?

Actions #11

Updated by Alex Litvak over 3 years ago

Alex Litvak wrote:

Will the fix to it be posted soon? I am building ceph in containers from existing releases, is there a tag I can use to either revert the commit that broke the cluster feature or a build that have a fix implemented?

Actions #12

Updated by Kefu Chai over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #13

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47986: nautilus: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs' added
Actions #14

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47987: octopus: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs' added
Actions #15

Updated by Nathan Cutler over 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF