Project

General

Profile

Bug #37871

Ceph cannot connect to any monitors if one of them has a DNS resolution problem

Added by Jairo Llopis 8 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
Start date:
01/11/2019
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
4 - irritation
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

My ceph cluster is configured with this:

mon host = mon1,mon2,mon3

If I remove the DNS entry for mon2 and, from mon1, I get status, it raises an error:

$ ceph -s
server name not found: mon2 (Name or service not known)
unable to parse addrs in 'mon1,mon2,mon3'
2019-01-11 11:31:10.269 7f9ec32dc700 -1 monclient: get_monmap_and_config cannot identify monitors to contact
[errno 22] error connecting to the cluster

According to the docs:

the mon host configuration option only needs to be sufficiently up to date such that a client can reach one monitor that is currently online.

The above configuration matches that requirement, since both mon1 and mon3 can still be resolved.

An additional detail is that if I replace that config line by the actual IP addresses and then check, it properly connects to a monitor and returns a status:

mon host = 172.21.0.3,172.21.0.5,172.21.0.7
$ ceph -s
  cluster:
    id:     7060741a-8aad-5f55-b64e-c3f527e322f8
    health: HEALTH_WARN
            1/3 mons down, quorum mon1,mon3

  services:
    mon: 3 daemons, quorum mon1,mon3, out of quorum: mon2
    mgr: mon3(active), standbys: mon1
    osd: 4 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     

So, of course, as a workaround, I'm gonna start writing the raw IP address list there. But I still think this is a bug, because it should only fail to contact the cluster in case all DNS entries fail, not in case just one fails, the same way it fails in case all IP addresses cannot be contacted, not when just one fails. That's the whole point of resilience, isn't it?

Also available in: Atom PDF