Project

General

Profile

Actions

Bug #23078

closed

SRV resolution fails to lookup AAAA records

Added by Simon Leinen about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Category:
Administration/Usability
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have some IPv6 Rados clusters. So far we have been specifying the addresses of each cluster's three mons using literal IPv6 addresses. This is suboptimal because it makes it hard to renumber mons.

Since we recently upgraded to Luminous, we thought we could use the SRV feature documented here. The documentation even mentions AAAA records and IPv6 mons. Cool!

Unfortunately when we actually try this and add SRV records:

_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0003.s1.scloud.switch.ch.
_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0004.s1.scloud.switch.ch.
_ceph-mon._tcp.s1.scloud.switch.ch. IN SRV 10 60 6789 s0001.s1.scloud.switch.ch.

in addition to the already existing AAAA records:

s0001.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1001
s0003.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1003
s0004.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1004

and remove the ceph mon definition from /etc/ceph/ceph.conf, then commands such as ceph -s fail with an error message:

$ ceph -s
2018-02-21 23:20:51.754012 7f9c0abcc700 -1 res_query() failed
2018-02-21 23:20:51.755398 7f9c0abcc700 -1 res_query() failed
no monitors specified to connect to.
2018-02-21 23:20:51.756150 7f9c0abcc700 -1 res_query() failed
[errno 2] error connecting to the cluster

Observing DNS traffic, we see that an SRV query goes out, a good response comes in, but then the client only asks for A records for the hostnames on the right-hand side, and these naturally fail because we only publish the hosts' IPv6 addresses (as AAAA records).

If we look at L217 in src/common/resolve.cc, there's a search for ns_t_a records, but no such search for ns_t_aaaa records. That is probably the underlying problem here.


Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #23174: luminous: SRV resolution fails to lookup AAAA recordsResolvedPrashant DActions
Actions #1

Updated by Simon Leinen about 6 years ago

WANG Guoqin actually noted the lack of IPv6 support in a comment on issue #14527.

He also had a suggestion for a fix that looks excellent to me:

Maybe we can choose among ns_t_a and ns_t_aaaa according to conf->ms_bind_ipv6 in ceph/dns_resolve.cc. I'll be working on this in the following days, hopefully, but if someone's more familiar with this there could be less pain :)

What do people think?

Actions #2

Updated by Wido den Hollander about 6 years ago

Simon Leinen wrote:

WANG Guoqin actually noted the lack of IPv6 support in a comment on issue #14527.

He also had a suggestion for a fix that looks excellent to me:

Maybe we can choose among ns_t_a and ns_t_aaaa according to conf->ms_bind_ipv6 in ceph/dns_resolve.cc. I'll be working on this in the following days, hopefully, but if someone's more familiar with this there could be less pain :)

What do people think?

Seems like a good solution!

I wrote a PR: https://github.com/ceph/ceph/pull/20530

Actions #3

Updated by Wido den Hollander about 6 years ago

In the meantime btw, a Round Robin IPv6 DNS record works just fine, something like:

mon.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1001
mon.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1003
mon.s1.scloud.switch.ch. IN    AAAA    2001:620:5ca1:8001::1004
Actions #4

Updated by Kefu Chai about 6 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Wido den Hollander
  • Component(RADOS) Monitor added
  • Component(RADOS) deleted (librados)
Actions #5

Updated by Kefu Chai about 6 years ago

  • Status changed from Fix Under Review to 7
Actions #6

Updated by Kefu Chai about 6 years ago

  • Status changed from 7 to Pending Backport
  • Backport set to luminous
Actions #7

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #23174: luminous: SRV resolution fails to lookup AAAA records added
Actions #8

Updated by Nathan Cutler about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF