Project

General

Profile

Actions

Bug #19316

closed

chacra nodes that are bounced don't persist the (correct) hostname - breaks rabbit+chacra

Added by Alfredo Deza about 7 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
High
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After a few chacra nodes were turned off by OVH (and back on again) they came up like:

$ ssh 2.chacra.ceph.com
ubuntu@2:~$

The problem (I believe) shows up because the hostname being used is a FQDN, so the short hostname translates into the first sub-domain (first split after the '.')
so for 3.chacra.ceph.com this means:

ubuntu@3:~$ hostname
3.chacra.ceph.com

Which breaks rabbitmq:

Mar 20 11:58:31 3.chacra.ceph.com rabbitmqctl[1930]: attempted to contact: [rabbit@3]
Mar 20 11:58:31 3.chacra.ceph.com rabbitmqctl[1930]: rabbit@3:
Mar 20 11:58:31 3.chacra.ceph.com rabbitmqctl[1930]:   * unable to connect to epmd (port 4369) on 3: badarg (unknown POSIX error)

Because it sees '3' as being the hostname, which is wrong.

The /etc/hostname shows:

ubuntu@3:~$ cat /etc/hostname
3.chacra.ceph.com

And /etc/hosts has:

ubuntu@3:~$ cat /etc/hosts
# Ansible managed: /Users/andrewschoen/dev/chacra/deploy/playbooks/roles/common/templates/hosts.j2 modified on 2016-07-19 09:02:00 by andrewschoen on Andrews-MacBook-Pro-2.local
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
127.0.1.1 chacra3

158.69.93.173 3.chacra.ceph.com

After manually setting the hostname to `chacra3` (and 2.chacra.ceph.com to `chacra2`) rabbitmq was able to start.

Actions #1

Updated by David Galloway over 6 years ago

Unfortunately discovered while deploying 5.chacra.ceph.com that rabbitmq can't even use '5' as the short hostname.

Actions #2

Updated by David Galloway over 6 years ago

  • Assignee set to David Galloway
Actions #3

Updated by David Galloway over 6 years ago

This still needs to be deployed.

Actions #4

Updated by David Galloway almost 6 years ago

  • Status changed from 12 to Resolved
Actions

Also available in: Atom PDF