Project

General

Profile

Actions

Bug #47300

closed

mount.ceph fails to understand AAAA records from SRV record

Added by Daniël Vos over 3 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

Unsure if this belongs to CephFS or RADOS :-). I have seen numerous of issues here regarding IPv6/AAAA records and SRV records that were all fixed in older versions like: https://tracker.ceph.com/issues/23078 and https://tracker.ceph.com/issues/23174

I am trying to get my clients to mount a CephFS on my IPv6 only ceph cluster using a SRV record to obtain monitor information. This does not work.

Client kernel: 5.4.0
Client ceph version: 15.2.3-0ubuntu0.20.04.1

SRV record:

;; ANSWER SECTION:
_ceph-mon._tcp.example.dev. 78      IN      SRV     10 60 3300 node1.example.dev.
_ceph-mon._tcp.example.dev. 78      IN      SRV     10 60 6789 node3.example.dev.
_ceph-mon._tcp.example.dev. 78      IN      SRV     10 60 6789 node2.example.dev.

node[1-3] only have an AAAA record, no A record.

My fstab:

:/backups /data/cephfs ceph    name=backups,fs=backups,noatime,_netdev   0   2

Attempting to mount without a /etc/ceph/ceph.conf:

mount /data/cephfs
did not load config file, using default settings.
2020-09-04T09:21:41.173+0200 ffffad442010 -1 Errors while parsing config file!
2020-09-04T09:21:41.173+0200 ffffad442010 -1 parse_file: filesystem error: cannot get file size: No such file or directory [ceph.conf]
2020-09-04T09:21:41.173+0200 ffffad442010 -1 Errors while parsing config file!
2020-09-04T09:21:41.173+0200 ffffad442010 -1 parse_file: filesystem error: cannot get file size: No such file or directory [ceph.conf]
2020-09-04T09:21:41.189+0200 ffffad442010 -1 res_query() failed
2020-09-04T09:21:41.193+0200 ffffad442010 -1 res_query() failed
no monitors specified to connect to.2020-09-04T09:21:41.193+0200 ffffad442010 -1 res_query() failed

unable to determine mon addresses

Attempting to mount with the following ceph.conf:

[global]
ms bind ipv6 = true

mount /data/cephfs
server name not found: 2001:db8:2000:8a:ec4:7aff:fe31:d638:6789 (Name or service not known)
failed to resolve source

If I change my fstab and add the monitor nodes like this it works fine but that defeats the purpose of the SRV record.

node1,node2,node3:/backups /data/cephfs ceph    name=backups,fs=backups,noatime,_netdev   0   2

mount | grep ceph
[2001:db8:2000:8a:ae1f:6bff:fe05:9e70],[2001:db8:2000:8a:ec4:7aff:fede:d796],[2001:db8:2000:8a:ec4:7aff:fe31:d638]:/backups on /data/cephfs type ceph (rw,noatime,name=backups,secret=<hidden>,acl,mds_namespace=backups)

Sidenote that this only works with `ms bind ipv6 = true` in ceph.conf, without that option you are thrown the following error because it probably only searches for an a record:

mount /data/cephfs
2020-09-04T09:34:25.965+0200 ffff92502010 -1 res_query() failed
2020-09-04T09:34:25.969+0200 ffff92502010 -1 res_query() failed
no monitors specified to connect to.
2020-09-04T09:34:25.973+0200 ffff92502010 -1 res_query() failed

So far i can't seem to find a combination of configuration that works that lets me obtain my monitor addresses from a SRV record and actually giving me a working cephfs mount.

Documentation: https://docs.ceph.com/docs/master/rados/configuration/mon-lookup-dns/


Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #55513: quincy: mount.ceph fails to understand AAAA records from SRV recordResolvedMatan BreizmanActions
Copied to RADOS - Backport #55514: pacific: mount.ceph fails to understand AAAA records from SRV recordResolvedMatan BreizmanActions
Actions #1

Updated by Josh Durgin over 3 years ago

Thanks for the detailed description. The earlier fix clearly depends on ms_bind_ipv6: https://github.com/ceph/ceph/pull/20530

The mon client should try both ipv4 and v6 resolution to be more user friendly and not require the ms_bind_ipv6 option which doesn't make sense for clients.
The other part of the bug, where the resolution fails even with that setting, needs further investigation.

Actions #2

Updated by Daniël Vos over 2 years ago

Issue still present on 16.2.6 (ceph packages 16.2.6-1focal, kernel 5.11.0-38-generic)

ms bind ipv6 = true
is no longer required, the following error remains:
# mount /data/cephfs
server name not found: 2001:db8:2000:8a:ec4:7aff:fe31:d638:6789 (Name or service not known)
failed to resolve source
Actions #3

Updated by Neha Ojha over 2 years ago

  • Priority changed from Normal to High
Actions #4

Updated by Radoslaw Zarzynski about 2 years ago

  • Assignee set to Matan Breizman
  • Priority changed from High to Normal

Lowering the priority as there are no recent reports about the issue and assigning as it's a good way to learn about networking / msgr. Matan, please reach out to me if you need help. This isn't urgent.

Actions #5

Updated by Matan Breizman about 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 46051
Actions #6

Updated by Kefu Chai almost 2 years ago

  • Status changed from Fix Under Review to Resolved
Actions #7

Updated by Matan Breizman almost 2 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to pacific,quincy
Actions #8

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #55513: quincy: mount.ceph fails to understand AAAA records from SRV record added
Actions #9

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #55514: pacific: mount.ceph fails to understand AAAA records from SRV record added
Actions #10

Updated by Matan Breizman almost 2 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF