Bug #23078
closedSRV resolution fails to lookup AAAA records
0%
Description
We have some IPv6 Rados clusters. So far we have been specifying the addresses of each cluster's three mons using literal IPv6 addresses. This is suboptimal because it makes it hard to renumber mons.
Since we recently upgraded to Luminous, we thought we could use the SRV feature documented here. The documentation even mentions AAAA records and IPv6 mons. Cool!
Unfortunately when we actually try this and add SRV records:
_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0003.s1.scloud.switch.ch. _ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0004.s1.scloud.switch.ch. _ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0001.s1.scloud.switch.ch.
in addition to the already existing AAAA records:
s0001.s1.scloud.switch.ch. IN AAAA 2001:620:5ca1:8001::1001 s0003.s1.scloud.switch.ch. IN AAAA 2001:620:5ca1:8001::1003 s0004.s1.scloud.switch.ch. IN AAAA 2001:620:5ca1:8001::1004
and remove the ceph mon
definition from /etc/ceph/ceph.conf
, then commands such as ceph -s
fail with an error message:
$ ceph -s 2018-02-21 23:20:51.754012 7f9c0abcc700 -1 res_query() failed 2018-02-21 23:20:51.755398 7f9c0abcc700 -1 res_query() failed no monitors specified to connect to. 2018-02-21 23:20:51.756150 7f9c0abcc700 -1 res_query() failed [errno 2] error connecting to the cluster
Observing DNS traffic, we see that an SRV query goes out, a good response comes in, but then the client only asks for A
records for the hostnames on the right-hand side, and these naturally fail because we only publish the hosts' IPv6 addresses (as AAAA
records).
If we look at L217 in src/common/resolve.cc, there's a search for ns_t_a
records, but no such search for ns_t_aaaa
records. That is probably the underlying problem here.