Project

General

Profile

Bug #23078

SRV resolution fails to lookup AAAA records

Added by Simon Leinen 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Category:
Administration/Usability
Target version:
-
Start date:
02/21/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:

Description

We have some IPv6 Rados clusters. So far we have been specifying the addresses of each cluster's three mons using literal IPv6 addresses. This is suboptimal because it makes it hard to renumber mons.

Since we recently upgraded to Luminous, we thought we could use the SRV feature documented here. The documentation even mentions AAAA records and IPv6 mons. Cool!

Unfortunately when we actually try this and add SRV records:

_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0003.s1.scloud.switch.ch.
_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0004.s1.scloud.switch.ch.
_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0001.s1.scloud.switch.ch.

in addition to the already existing AAAA records:

s0001.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1001
s0003.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1003
s0004.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1004

and remove the ceph mon definition from /etc/ceph/ceph.conf, then commands such as ceph -s fail with an error message:

$ ceph -s
2018-02-21 23:20:51.754012 7f9c0abcc700 -1 res_query() failed
2018-02-21 23:20:51.755398 7f9c0abcc700 -1 res_query() failed
no monitors specified to connect to.
2018-02-21 23:20:51.756150 7f9c0abcc700 -1 res_query() failed
[errno 2] error connecting to the cluster

Observing DNS traffic, we see that an SRV query goes out, a good response comes in, but then the client only asks for A records for the hostnames on the right-hand side, and these naturally fail because we only publish the hosts' IPv6 addresses (as AAAA records).

If we look at L217 in src/common/resolve.cc, there's a search for ns_t_a records, but no such search for ns_t_aaaa records. That is probably the underlying problem here.


Related issues

Copied to RADOS - Backport #23174: luminous: SRV resolution fails to lookup AAAA records Resolved

History

#1 Updated by Simon Leinen 11 months ago

WANG Guoqin actually noted the lack of IPv6 support in a comment on issue #14527.

He also had a suggestion for a fix that looks excellent to me:

Maybe we can choose among ns_t_a and ns_t_aaaa according to conf->ms_bind_ipv6 in ceph/dns_resolve.cc. I'll be working on this in the following days, hopefully, but if someone's more familiar with this there could be less pain :)

What do people think?

#2 Updated by Wido den Hollander 11 months ago

Simon Leinen wrote:

WANG Guoqin actually noted the lack of IPv6 support in a comment on issue #14527.

He also had a suggestion for a fix that looks excellent to me:

Maybe we can choose among ns_t_a and ns_t_aaaa according to conf->ms_bind_ipv6 in ceph/dns_resolve.cc. I'll be working on this in the following days, hopefully, but if someone's more familiar with this there could be less pain :)

What do people think?

Seems like a good solution!

I wrote a PR: https://github.com/ceph/ceph/pull/20530

#3 Updated by Wido den Hollander 11 months ago

In the meantime btw, a Round Robin IPv6 DNS record works just fine, something like:

mon.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1001
mon.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1003
mon.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1004

#4 Updated by Kefu Chai 11 months ago

  • Status changed from New to Need Review
  • Assignee set to Wido den Hollander
  • Component(RADOS) Monitor added
  • Component(RADOS) deleted (librados)

#5 Updated by Kefu Chai 11 months ago

  • Status changed from Need Review to Testing

#6 Updated by Kefu Chai 11 months ago

  • Status changed from Testing to Pending Backport
  • Backport set to luminous

#7 Updated by Nathan Cutler 11 months ago

  • Copied to Backport #23174: luminous: SRV resolution fails to lookup AAAA records added

#8 Updated by Nathan Cutler 10 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF