Project

General

Profile

Actions

Bug #23078

closed

SRV resolution fails to lookup AAAA records

Added by Simon Leinen about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Category:
Administration/Usability
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have some IPv6 Rados clusters. So far we have been specifying the addresses of each cluster's three mons using literal IPv6 addresses. This is suboptimal because it makes it hard to renumber mons.

Since we recently upgraded to Luminous, we thought we could use the SRV feature documented here. The documentation even mentions AAAA records and IPv6 mons. Cool!

Unfortunately when we actually try this and add SRV records:

_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0003.s1.scloud.switch.ch.
_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0004.s1.scloud.switch.ch.
_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0001.s1.scloud.switch.ch.

in addition to the already existing AAAA records:

s0001.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1001
s0003.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1003
s0004.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1004

and remove the ceph mon definition from /etc/ceph/ceph.conf, then commands such as ceph -s fail with an error message:

$ ceph -s
2018-02-21 23:20:51.754012 7f9c0abcc700 -1 res_query() failed
2018-02-21 23:20:51.755398 7f9c0abcc700 -1 res_query() failed
no monitors specified to connect to.
2018-02-21 23:20:51.756150 7f9c0abcc700 -1 res_query() failed
[errno 2] error connecting to the cluster

Observing DNS traffic, we see that an SRV query goes out, a good response comes in, but then the client only asks for A records for the hostnames on the right-hand side, and these naturally fail because we only publish the hosts' IPv6 addresses (as AAAA records).

If we look at L217 in src/common/resolve.cc, there's a search for ns_t_a records, but no such search for ns_t_aaaa records. That is probably the underlying problem here.


Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #23174: luminous: SRV resolution fails to lookup AAAA recordsResolvedPrashant DActions
Actions

Also available in: Atom PDF