Bug #45672
Unable to add additional hosts to cluster using cephadm
0%
Description
After configuring nodes 2 and 3 with permission for the node 1 root user to SSH with Ceph's configuration and key, command `ceph orch host add node2` is unable to connect.
Environment:
Ceph version: 15.2.2 (node 1 installed with cephadm)
OS Version: Ubuntu 18.04
Docker Version: Docker CE 19.03.9, build 9d988398e7
STR:
1. verify `ping node2` is successful
2. ceph cephadm get-ssh-config > ceph_config
3. ceph config-key get mgr/cephadm/ssh_identity_key
4. ssh -F ./ceph_config -i ./ceph_key root@node2
5. Observe manual SSH connection is successful
6. Run `ceph orch host add node2` and observe the following error while running `ceph -W cephadm`
Error: 2020-05-23T01:23:46.735873+0000 mgr.node1.pfnxpe [ERR] _Promise failed Traceback (most recent call last): File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 48, in bootstrap_exec s = io.read(1) File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf))) EOFError: expected 1 bytes, got 0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1569, in _run_cephadm conn, connr = self._get_connection(addr) File "/usr/share/ceph/mgr/cephadm/module.py", line 1529, in _get_connection ssh_options=self._ssh_options) File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__ self.gateway = self._make_gateway(hostname) File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 44, in _make_gateway self._make_connection_string(hostname) File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in makegateway gw = gateway_bootstrap.bootstrap(io, spec) File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, in bootstrap bootstrap_exec(io, spec) File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 53, in bootstrap_exec raise HostNotFound(io.remoteaddress) execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-lq9eq8la -i /tmp/cephadm-identity-k5yb36z7 root@node2 The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 457, in do_work res = self._on_complete_(*args, **kwargs) File "/usr/share/ceph/mgr/cephadm/module.py", line 525, in <lambda> return cls(on_complete=lambda x: f(*x), value=args, name=name, **c_kwargs) File "/usr/share/ceph/mgr/cephadm/module.py", line 1682, in add_host error_ok=True, no_fsid=True) File "/usr/share/ceph/mgr/cephadm/module.py", line 1657, in _run_cephadm raise OrchestratorError(msg) from e orchestrator._interface.OrchestratorError: Failed to connect to node2 (node2). Check that the host is reachable and accepts connections using the cephadm SSH key you may want to run: > ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@node2
History
#1 Updated by Sebastian Wagner almost 4 years ago
- Project changed from Ceph to Orchestrator
#2 Updated by Sebastian Wagner almost 4 years ago
execnet is again very helpful with their exceptions this time.
#3 Updated by Sebastian Wagner almost 4 years ago
might want to run
ceph mgr fail
#4 Updated by Dan Skaggs almost 4 years ago
I wound up getting around this by using an Ansible role in which this worked successfully. You can feel free to close this if no one else is reporting it.
#5 Updated by Joshua Schmid over 3 years ago
- Status changed from New to Can't reproduce
thanks, closing