Bug #23078
closedSRV resolution fails to lookup AAAA records
0%
Description
We have some IPv6 Rados clusters. So far we have been specifying the addresses of each cluster's three mons using literal IPv6 addresses. This is suboptimal because it makes it hard to renumber mons.
Since we recently upgraded to Luminous, we thought we could use the SRV feature documented here. The documentation even mentions AAAA records and IPv6 mons. Cool!
Unfortunately when we actually try this and add SRV records:
_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0003.s1.scloud.switch.ch. _ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0004.s1.scloud.switch.ch. _ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0001.s1.scloud.switch.ch.
in addition to the already existing AAAA records:
s0001.s1.scloud.switch.ch. IN AAAA 2001:620:5ca1:8001::1001 s0003.s1.scloud.switch.ch. IN AAAA 2001:620:5ca1:8001::1003 s0004.s1.scloud.switch.ch. IN AAAA 2001:620:5ca1:8001::1004
and remove the ceph mon
definition from /etc/ceph/ceph.conf
, then commands such as ceph -s
fail with an error message:
$ ceph -s 2018-02-21 23:20:51.754012 7f9c0abcc700 -1 res_query() failed 2018-02-21 23:20:51.755398 7f9c0abcc700 -1 res_query() failed no monitors specified to connect to. 2018-02-21 23:20:51.756150 7f9c0abcc700 -1 res_query() failed [errno 2] error connecting to the cluster
Observing DNS traffic, we see that an SRV query goes out, a good response comes in, but then the client only asks for A
records for the hostnames on the right-hand side, and these naturally fail because we only publish the hosts' IPv6 addresses (as AAAA
records).
If we look at L217 in src/common/resolve.cc, there's a search for ns_t_a
records, but no such search for ns_t_aaaa
records. That is probably the underlying problem here.
Updated by Simon Leinen about 6 years ago
WANG Guoqin actually noted the lack of IPv6 support in a comment on issue #14527.
He also had a suggestion for a fix that looks excellent to me:
Maybe we can choose among ns_t_a and ns_t_aaaa according to conf->ms_bind_ipv6 in ceph/dns_resolve.cc. I'll be working on this in the following days, hopefully, but if someone's more familiar with this there could be less pain :)
What do people think?
Updated by Wido den Hollander about 6 years ago
Simon Leinen wrote:
WANG Guoqin actually noted the lack of IPv6 support in a comment on issue #14527.
He also had a suggestion for a fix that looks excellent to me:
Maybe we can choose among ns_t_a and ns_t_aaaa according to conf->ms_bind_ipv6 in ceph/dns_resolve.cc. I'll be working on this in the following days, hopefully, but if someone's more familiar with this there could be less pain :)
What do people think?
Seems like a good solution!
I wrote a PR: https://github.com/ceph/ceph/pull/20530
Updated by Wido den Hollander about 6 years ago
In the meantime btw, a Round Robin IPv6 DNS record works just fine, something like:
mon.s1.scloud.switch.ch. IN AAAA 2001:620:5ca1:8001::1001 mon.s1.scloud.switch.ch. IN AAAA 2001:620:5ca1:8001::1003 mon.s1.scloud.switch.ch. IN AAAA 2001:620:5ca1:8001::1004
Updated by Kefu Chai about 6 years ago
- Status changed from New to Fix Under Review
- Assignee set to Wido den Hollander
- Component(RADOS) Monitor added
- Component(RADOS) deleted (
librados)
Updated by Kefu Chai about 6 years ago
- Status changed from Fix Under Review to 7
Updated by Kefu Chai about 6 years ago
- Status changed from 7 to Pending Backport
- Backport set to luminous
Updated by Nathan Cutler about 6 years ago
- Copied to Backport #23174: luminous: SRV resolution fails to lookup AAAA records added
Updated by Nathan Cutler about 6 years ago
- Status changed from Pending Backport to Resolved