Bug #48158
closedcephadm bootstrap fails with custom ssh port
0%
Description
Attempting to bootstrap a Ceph cluster using cephadm
fails if the host is using a non-standard SSH port instead of port 22 despite providing a ssh_config
file that declares the custom port (e.g., 2222).
ssh_config
:
SendEnv LANG LC_* GIT_* Host * Port 2222
cephadm
output:
# curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm # chmod +x cephadm # mkdir -p /etc/ceph # ./cephadm bootstrap --mon-ip 10.147.8.12 --ssh-config ssh_config Verifying podman|docker is present... Verifying lvm2 is present... Verifying time synchronization is in place... Unit systemd-timesyncd.service is enabled and running Repeating the final host check... podman|docker (/usr/bin/docker) is present systemctl is present lvcreate is present Unit systemd-timesyncd.service is enabled and running Host looks OK Cluster fsid: e1b69b0c-22dd-11eb-a320-e3173a7d80ce Verifying IP 10.147.8.12 port 3300 ... Verifying IP 10.147.8.12 port 6789 ... Mon IP 10.147.8.12 is in CIDR network 10.147.8.0/24 Pulling container image docker.io/ceph/ceph:v15... Extracting ceph user uid/gid from container image... Creating initial keys... Creating initial monmap... Creating mon... Waiting for mon to start... Waiting for mon... mon is available Assimilating anything we can from ceph.conf... Generating new minimal ceph.conf... Restarting the monitor... Setting mon public_network... Creating mgr... Verifying port 9283 ... Wrote keyring to /etc/ceph/ceph.client.admin.keyring Wrote config to /etc/ceph/ceph.conf Waiting for mgr to start... Waiting for mgr... mgr not available, waiting (1/10)... mgr not available, waiting (2/10)... mgr not available, waiting (3/10)... mgr not available, waiting (4/10)... mgr is available Enabling cephadm module... Waiting for the mgr to restart... Waiting for Mgr epoch 5... Mgr epoch 5 is available Setting orchestrator backend to cephadm... Using provided ssh config... Generating ssh key... Wrote public SSH key to to /etc/ceph/ceph.pub Adding key to root@localhost's authorized_keys... Adding host nuc2... Non-zero exit code 22 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=nuc2 -v /var/log/ceph/e1b69b0c-22dd-11eb-a320-e3173a7d80ce:/var/log/ceph:z -v /tmp/ceph-tmpces5opqh:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpnzh59b2m:/etc/ceph/ceph.conf:z docker.io/ceph/ceph:v15 orch host add nuc2 /usr/bin/ceph:stderr Error EINVAL: Failed to connect to nuc2 (nuc2). /usr/bin/ceph:stderr Check that the host is reachable and accepts connections using the cephadm SSH key /usr/bin/ceph:stderr /usr/bin/ceph:stderr you may want to run: /usr/bin/ceph:stderr > ceph cephadm get-ssh-config > ssh_config /usr/bin/ceph:stderr > ceph config-key get mgr/cephadm/ssh_identity_key > key /usr/bin/ceph:stderr > ssh -F ssh_config -i key root@nuc2 ERROR: Failed to add host <nuc2>: Failed command: /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=nuc2 -v /var/log/ceph/e1b69b0c-22dd-11eb-a320-e3173a7d80ce:/var/log/ceph:z -v /tmp/ceph-tmpces5opqh:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpnzh59b2m:/etc/ceph/ceph.conf:z docker.io/ceph/ceph:v15 orch host add nuc2
The same sequence (without the custom ssh_config
file) works when the host's sshd is listening on port 22.
Updated by Thilo-Alexander Ginkel over 3 years ago
Manually attempting to apply the troubleshooting instructions does indeed work, so the root cause is unclear to me:
# ceph orch host add nuc2 Error EINVAL: Failed to connect to nuc2 (nuc2). Check that the host is reachable and accepts connections using the cephadm SSH key you may want to run: > ceph cephadm get-ssh-config > ssh_config > ceph config-key get mgr/cephadm/ssh_identity_key > key > ssh -F ssh_config -i key root@nuc2 # ssh -F ssh_config -i key root@nuc2 Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-52-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage System information as of Tue 10 Nov 2020 01:44:49 PM CET System load: 0.11 Users logged in: 1 Usage of /home: unknown IPv4 address for br-1df92a5d7551: 172.18.0.1 Memory usage: 3% IPv4 address for br-646df251478e: 172.19.0.1 Swap usage: 0% IPv4 address for docker0: 172.17.0.1 Temperature: 50.0 C IPv6 address for docker0: fd00:dead:beef::1 Processes: 214 IPv4 address for eno1: 10.147.8.12 * Introducing self-healing high availability clustering for MicroK8s! Super simple, hardened and opinionated Kubernetes for production. https://microk8s.io/high-availability 0 updates can be installed immediately. 0 of these updates are security updates. Last login: Tue Nov 10 13:39:10 2020 from 127.0.0.1
Updated by Michael Fritch over 3 years ago
- Status changed from New to Need More Info
These are set using the generic ceph key/value store (ceph config-key), which would require a restart of the mgr. It appears the current cephadm bootstrap logic sets the ssh_config after the mgr has been restarted.
The steps documented here could likely be improved:
https://docs.ceph.com/en/latest/cephadm/operations/#default-behavior
Does the following work as a workaround?
# ceph orch restart mgr # ceph orch host add nuc2
Updated by Thilo-Alexander Ginkel over 3 years ago
Thanks for your reply! Unfortunately, the workaround does not seem to work:
# ceph orch restart mgr # ceph orch host add nuc2 Error EINVAL: Failed to connect to nuc2 (nuc2). Check that the host is reachable and accepts connections using the cephadm SSH key you may want to run: > ceph cephadm get-ssh-config > ssh_config > ceph config-key get mgr/cephadm/ssh_identity_key > key > ssh -F ssh_config -i key root@nuc2
ceph status
shows stray nodes, could it be that the failed operation already added nuc2
, which can't be added a second time?
# ceph status cluster: id: beb4604e-2387-11eb-af09-a56995e69c1d health: HEALTH_WARN 2 stray daemons(s) not managed by cephadm 1 stray host(s) with 2 daemon(s) not managed by cephadm Reduced data availability: 1 pg inactive OSD count 0 < osd_pool_default_size 3 services: mon: 1 daemons, quorum nuc2 (age 71m) mgr: nuc2.eiuxdl(active, since 70m) osd: 0 osds: 0 up, 0 in data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 1 unknown
If you have an experimental patch for cephadm
, I could also give it a try...
Updated by Thilo-Alexander Ginkel over 3 years ago
I tried including the restart in cephadm
after setting the ssh config, but that also does not seem to have any effect:
--- cephadm 2020-11-10 21:50:30.979492804 +0100 +++ cephadm.patched 2020-11-10 21:49:18.263644675 +0100 @@ -3108,6 +3108,7 @@ pathify(args.ssh_config.name): '/tmp/cephadm-ssh-config:z', } cli(['cephadm', 'set-ssh-config', '-i', '/tmp/cephadm-ssh-config'], extra_mounts=mounts) + cli(['orch', 'restart', 'mgr']) if args.ssh_private_key and args.ssh_public_key: logger.info('Using provided ssh keys...')
Updated by Michael Fritch over 3 years ago
eh sorry, `ceph orch restart mgr` happens async, so issuing a `host add` immediately after won't work without some kind of poll/sleep
For a quick test, something like this might do better:
# cephadm unit --name mgr.nuc2.eiuxdl restart
I think a patch for this will come down to moving this logic:
.. to somewhere after we have set all the related ssh config that requires a mgr restart ..
Updated by Michael Fritch over 3 years ago
maybe not entirely ideal, but I think this will work:
https://github.com/ceph/ceph/commit/193a1b3fc2faee50ccc87f78e4bd28498f4264ef
mind giving it a quick try?
Updated by Thilo-Alexander Ginkel over 3 years ago
Still no luck... Removed all ceph containers, removed /var/lib/ceph and /etc/ceph, then:
# curl --silent --remote-name --location https://raw.githubusercontent.com/ceph/ceph/193a1b3fc2faee50ccc87f78e4bd28498f4264ef/src/cephadm/cephadm # chmod +x cephadm # ./cephadm bootstrap --ssh-config ssh_config --mon-ip 10.147.8.12 This is a development version of cephadm. For information regarding the latest stable release: https://docs.ceph.com/docs/octopus/cephadm/install Verifying podman|docker is present... Verifying lvm2 is present... Verifying time synchronization is in place... Unit systemd-timesyncd.service is enabled and running Repeating the final host check... podman|docker (/usr/bin/docker) is present systemctl is present lvcreate is present Unit systemd-timesyncd.service is enabled and running Host looks OK Cluster fsid: d32d00f4-243f-11eb-af09-a56995e69c1d Verifying IP 10.147.8.12 port 3300 ... Verifying IP 10.147.8.12 port 6789 ... Mon IP 10.147.8.12 is in CIDR network 10.147.8.0/24 Pulling container image docker.io/ceph/daemon-base:latest-master-devel... Extracting ceph user uid/gid from container image... Creating initial keys... Creating initial monmap... Creating mon... Waiting for mon to start... Waiting for mon... mon is available Assimilating anything we can from ceph.conf... Generating new minimal ceph.conf... Restarting the monitor... Setting mon public_network... Creating mgr... Verifying port 9283 ... Wrote keyring to /etc/ceph/ceph.client.admin.keyring Wrote config to /etc/ceph/ceph.conf Waiting for mgr to start... Waiting for mgr... mgr not available, waiting (1/10)... mgr not available, waiting (2/10)... mgr not available, waiting (3/10)... mgr not available, waiting (4/10)... mgr is available Enabling cephadm module... Waiting for the mgr to restart... Waiting for mgr epoch 5... mgr epoch 5 is available Setting orchestrator backend to cephadm... Using provided ssh config... Generating ssh key... Wrote public SSH key to to /etc/ceph/ceph.pub Adding key to root@localhost's authorized_keys... Restarting cephadm module... Waiting for the mgr to restart... Waiting for mgr epoch 10... mgr epoch 10 is available Adding host nuc2... Non-zero exit code 22 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=nuc2 -v /var/log/ceph/d32d00f4-243f-11eb-af09-a56995e69c1d:/var/log/ceph:z -v /tmp/ceph-tmpbkjw3wjn:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpp6qcpwxb:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel orch host add nuc2 /usr/bin/ceph:stderr Error EINVAL: Failed to connect to nuc2 (nuc2). /usr/bin/ceph:stderr Please make sure that the host is reachable and accepts connections using the cephadm SSH key /usr/bin/ceph:stderr /usr/bin/ceph:stderr To add the cephadm SSH key to the host: /usr/bin/ceph:stderr > ceph cephadm get-pub-key > ~/ceph.pub /usr/bin/ceph:stderr > ssh-copy-id -f -i ~/ceph.pub root@nuc2 /usr/bin/ceph:stderr /usr/bin/ceph:stderr To check that the host is reachable: /usr/bin/ceph:stderr > ceph cephadm get-ssh-config > ssh_config /usr/bin/ceph:stderr > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key /usr/bin/ceph:stderr > ssh -F ssh_config -i ~/cephadm_private_key root@nuc2 ERROR: Failed to add host <nuc2>: Failed command: /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=nuc2 -v /var/log/ceph/d32d00f4-243f-11eb-af09-a56995e69c1d:/var/log/ceph:z -v /tmp/ceph-tmpbkjw3wjn:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpp6qcpwxb:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel orch host add nuc2
Updated by Michael Fritch over 3 years ago
finally had a chance to test this and the errors reported from the underlying remoto lib are less than helpful:
Nov 11 22:26:05 node1 conmon[811925]: debug 2020-11-11T21:26:05.796+0000 7fef60ace700 0 log_channel(audit) log [DBG] : from='client.14272 -' entity='client.admin' cmd=[{"prefix": "orch host add", "hostname": "node1", "target": ["mon-mgr", ""]}]: dispatch Nov 11 22:26:05 node1 conmon[811925]: [31B blob data] Nov 11 22:26:05 node1 conmon[811925]: debug 2020-11-11T21:26:05.872+0000 7fef6bf74700 0 [cephadm ERROR orchestrator._interface] _Promise failed Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last): Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 48, in bootstrap_exec Nov 11 22:26:05 node1 conmon[811925]: s = io.read(1) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read Nov 11 22:26:05 node1 conmon[811925]: raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf))) Nov 11 22:26:05 node1 conmon[811925]: EOFError: expected 1 bytes, got 0 Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: During handling of the above exception, another exception occurred: Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last): Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 998, in _remote_connection Nov 11 22:26:05 node1 conmon[811925]: conn, connr = self._get_connection(addr) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 961, in _get_connection Nov 11 22:26:05 node1 conmon[811925]: sudo=True if self.ssh_user != 'root' else False) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__ Nov 11 22:26:05 node1 conmon[811925]: self.gateway = self._make_gateway(hostname) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 44, in _make_gateway Nov 11 22:26:05 node1 conmon[811925]: self._make_connection_string(hostname) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in makegateway Nov 11 22:26:05 node1 conmon[811925]: gw = gateway_bootstrap.bootstrap(io, spec) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, in bootstrap Nov 11 22:26:05 node1 conmon[811925]: bootstrap_exec(io, spec) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 53, in bootstrap_exec Nov 11 22:26:05 node1 conmon[811925]: raise HostNotFound(io.remoteaddress) Nov 11 22:26:05 node1 conmon[811925]: execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-lqkx72b8 -i /tmp/cephadm-identity-rx03t2fk root@node1 Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: The above exception was the direct cause of the following exception: Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last): Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 295, in _finalize Nov 11 22:26:05 node1 conmon[811925]: next_result = self._on_complete(self._value) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 108, in <lambda> Nov 11 22:26:05 node1 conmon[811925]: return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs)) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1190, in add_host Nov 11 22:26:05 node1 conmon[811925]: return self._add_host(spec) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1176, in _add_host Nov 11 22:26:05 node1 conmon[811925]: error_ok=True, no_fsid=True) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1079, in _run_cephadm Nov 11 22:26:05 node1 conmon[811925]: with self._remote_connection(host, addr) as tpl: Nov 11 22:26:05 node1 conmon[811925]: File "/lib64/python3.6/contextlib.py", line 81, in __enter__ Nov 11 22:26:05 node1 conmon[811925]: return next(self.gen) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1025, in _remote_connection Nov 11 22:26:05 node1 conmon[811925]: raise OrchestratorError(msg) from e Nov 11 22:26:05 node1 conmon[811925]: orchestrator._interface.OrchestratorError: Failed to connect to node1 (node1). Nov 11 22:26:05 node1 conmon[811925]: Please make sure that the host is reachable and accepts connections using the cephadm SSH key Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: To add the cephadm SSH key to the host: Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-pub-key > ~/ceph.pub Nov 11 22:26:05 node1 conmon[811925]: > ssh-copy-id -f -i ~/ceph.pub root@node1 Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: To check that the host is reachable: Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-ssh-config > ssh_config Nov 11 22:26:05 node1 conmon[811925]: > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key Nov 11 22:26:05 node1 conmon[811925]: > ssh -F ssh_config -i ~/cephadm_private_key root@node1 Nov 11 22:26:05 node1 conmon[811925]: debug 2020-11-11T21:26:05.872+0000 7fef6bf74700 -1 log_channel(cephadm) log [ERR] : _Promise failed Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last): Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 48, in bootstrap_exec Nov 11 22:26:05 node1 conmon[811925]: s = io.read(1) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read Nov 11 22:26:05 node1 conmon[811925]: raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf))) Nov 11 22:26:05 node1 conmon[811925]: EOFError: expected 1 bytes, got 0 Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: During handling of the above exception, another exception occurred: Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last): Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 998, in _remote_connection Nov 11 22:26:05 node1 conmon[811925]: conn, connr = self._get_connection(addr) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 961, in _get_connection Nov 11 22:26:05 node1 conmon[811925]: sudo=True if self.ssh_user != 'root' else False) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__ Nov 11 22:26:05 node1 conmon[811925]: self.gateway = self._make_gateway(hostname) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 44, in _make_gateway Nov 11 22:26:05 node1 conmon[811925]: self._make_connection_string(hostname) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in makegateway Nov 11 22:26:05 node1 conmon[811925]: gw = gateway_bootstrap.bootstrap(io, spec) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, in bootstrap Nov 11 22:26:05 node1 conmon[811925]: bootstrap_exec(io, spec) Nov 11 22:26:05 node1 conmon[811925]: File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 53, in bootstrap_exec Nov 11 22:26:05 node1 conmon[811925]: raise HostNotFound(io.remoteaddress) Nov 11 22:26:05 node1 conmon[811925]: execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-lqkx72b8 -i /tmp/cephadm-identity-rx03t2fk root@node1 Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: The above exception was the direct cause of the following exception: Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last): Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 295, in _finalize Nov 11 22:26:05 node1 conmon[811925]: next_result = self._on_complete(self._value) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 108, in <lambda> Nov 11 22:26:05 node1 conmon[811925]: return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs)) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1190, in add_host Nov 11 22:26:05 node1 conmon[811925]: return self._add_host(spec) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1176, in _add_host Nov 11 22:26:05 node1 conmon[811925]: error_ok=True, no_fsid=True) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1079, in _run_cephadm Nov 11 22:26:05 node1 conmon[811925]: with self._remote_connection(host, addr) as tpl: Nov 11 22:26:05 node1 conmon[811925]: File "/lib64/python3.6/contextlib.py", line 81, in __enter__ Nov 11 22:26:05 node1 conmon[811925]: return next(self.gen) Nov 11 22:26:05 node1 conmon[811925]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1025, in _remote_connection Nov 11 22:26:05 node1 conmon[811925]: raise OrchestratorError(msg) from e Nov 11 22:26:05 node1 conmon[811925]: orchestrator._interface.OrchestratorError: Failed to connect to node1 (node1). Nov 11 22:26:05 node1 conmon[811925]: Please make sure that the host is reachable and accepts connections using the cephadm SSH key Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: To add the cephadm SSH key to the host: Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-pub-key > ~/ceph.pub Nov 11 22:26:05 node1 conmon[811925]: > ssh-copy-id -f -i ~/ceph.pub root@node1 Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: To check that the host is reachable: Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-ssh-config > ssh_config Nov 11 22:26:05 node1 conmon[811925]: > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key Nov 11 22:26:05 node1 conmon[811925]: > ssh -F ssh_config -i ~/cephadm_private_key root@node1 Nov 11 22:26:05 node1 conmon[811925]: debug 2020-11-11T21:26:05.876+0000 7fef6bf74700 -1 mgr.server reply reply (22) Invalid argument Failed to connect to node1 (node1). Nov 11 22:26:05 node1 conmon[811925]: Please make sure that the host is reachable and accepts connections using the cephadm SSH key Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: To add the cephadm SSH key to the host: Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-pub-key > ~/ceph.pub Nov 11 22:26:05 node1 conmon[811925]: > ssh-copy-id -f -i ~/ceph.pub root@node1 Nov 11 22:26:05 node1 conmon[811925]: Nov 11 22:26:05 node1 conmon[811925]: To check that the host is reachable: Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-ssh-config > ssh_config Nov 11 22:26:05 node1 conmon[811925]: > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key Nov 11 22:26:05 node1 conmon[811925]: > ssh -F ssh_config -i ~/cephadm_private_key root@node1 Nov 11 22:26:07 node1 conmon[811925]: debug
Updated by Michael Fritch over 3 years ago
cephadm will use a default ssh_config as follows:
node1:~ # ceph cephadm get-ssh-config Host * User root StrictHostKeyChecking no UserKnownHostsFile /dev/null ConnectTimeout=30
Adding `StrictHostKeyChecking no` appears to resolve this:
node1:~ # cat ssh_config SendEnv LANG LC_* GIT_* Host * StrictHostKeyChecking no Port 2222 node1:~ # ceph cephadm set-ssh-config -i ssh_config node1:~ # ceph orch host add node1 Added host 'node1'
Maybe it's sufficient to check for the existence of `StrictHostKeyChecking` in the user defined ssh_config and raise and error msg if it is not defined?
Updated by Thilo-Alexander Ginkel over 3 years ago
Problem solved. Thanks a lot for your support! :-)
Updated by Michael Fritch over 3 years ago
- Status changed from Need More Info to Fix Under Review
- Assignee set to Michael Fritch
- Pull request ID set to 38052
Updated by Michael Fritch over 3 years ago
Thilo-Alexander Ginkel wrote:
Problem solved. Thanks a lot for your support! :-)
Thanks for the bug report! :)
Added this to validate the ssh config input:
https://github.com/ceph/ceph/pull/38052
StrictHostKeyChecking must be provided via the user-defined ssh config:
# cat ssh_config SendEnv LANG LC_* GIT_* Host * Port 2222 # ceph cephadm set-ssh-config -i ssh_config Error EINVAL: ssh_config requires StrictHostKeyChecking
StrictHostKeyChecking should be set to either yes/no:
# cat ssh_config SendEnv LANG LC_* GIT_* Host * StrictHostKeyChecking ask Port 2222 # ceph cephadm set-ssh-config -i ssh_config Error EINVAL: ssh_config cannot contain: 'StrictHostKeyChecking ask'
Updated by Sebastian Wagner over 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Sebastian Wagner over 3 years ago
- Status changed from Pending Backport to Resolved