Project

General

Profile

Actions

Bug #48158

closed

cephadm bootstrap fails with custom ssh port

Added by Thilo-Alexander Ginkel over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Attempting to bootstrap a Ceph cluster using cephadm fails if the host is using a non-standard SSH port instead of port 22 despite providing a ssh_config file that declares the custom port (e.g., 2222).

ssh_config:

SendEnv LANG LC_* GIT_*

Host *
  Port                  2222

cephadm output:

# curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm

# chmod +x cephadm

# mkdir -p /etc/ceph

# ./cephadm bootstrap --mon-ip 10.147.8.12 --ssh-config ssh_config 
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit systemd-timesyncd.service is enabled and running
Repeating the final host check...
podman|docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit systemd-timesyncd.service is enabled and running
Host looks OK
Cluster fsid: e1b69b0c-22dd-11eb-a320-e3173a7d80ce
Verifying IP 10.147.8.12 port 3300 ...
Verifying IP 10.147.8.12 port 6789 ...
Mon IP 10.147.8.12 is in CIDR network 10.147.8.0/24
Pulling container image docker.io/ceph/ceph:v15...
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network...
Creating mgr...
Verifying port 9283 ...
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Wrote config to /etc/ceph/ceph.conf
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/10)...
mgr not available, waiting (2/10)...
mgr not available, waiting (3/10)...
mgr not available, waiting (4/10)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for Mgr epoch 5...
Mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Using provided ssh config...
Generating ssh key...
Wrote public SSH key to to /etc/ceph/ceph.pub
Adding key to root@localhost's authorized_keys...
Adding host nuc2...
Non-zero exit code 22 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=nuc2 -v /var/log/ceph/e1b69b0c-22dd-11eb-a320-e3173a7d80ce:/var/log/ceph:z -v /tmp/ceph-tmpces5opqh:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpnzh59b2m:/etc/ceph/ceph.conf:z docker.io/ceph/ceph:v15 orch host add nuc2
/usr/bin/ceph:stderr Error EINVAL: Failed to connect to nuc2 (nuc2).
/usr/bin/ceph:stderr Check that the host is reachable and accepts connections using the cephadm SSH key
/usr/bin/ceph:stderr 
/usr/bin/ceph:stderr you may want to run:
/usr/bin/ceph:stderr > ceph cephadm get-ssh-config > ssh_config
/usr/bin/ceph:stderr > ceph config-key get mgr/cephadm/ssh_identity_key > key
/usr/bin/ceph:stderr > ssh -F ssh_config -i key root@nuc2
ERROR: Failed to add host <nuc2>: Failed command: /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=nuc2 -v /var/log/ceph/e1b69b0c-22dd-11eb-a320-e3173a7d80ce:/var/log/ceph:z -v /tmp/ceph-tmpces5opqh:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpnzh59b2m:/etc/ceph/ceph.conf:z docker.io/ceph/ceph:v15 orch host add nuc2

The same sequence (without the custom ssh_config file) works when the host's sshd is listening on port 22.

Actions #1

Updated by Thilo-Alexander Ginkel over 3 years ago

Manually attempting to apply the troubleshooting instructions does indeed work, so the root cause is unclear to me:

# ceph orch host add nuc2          
Error EINVAL: Failed to connect to nuc2 (nuc2).
Check that the host is reachable and accepts connections using the cephadm SSH key

you may want to run:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > key
> ssh -F ssh_config -i key root@nuc2

# ssh -F ssh_config -i key root@nuc2

Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-52-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Tue 10 Nov 2020 01:44:49 PM CET

  System load:    0.11      Users logged in:                  1
  Usage of /home: unknown   IPv4 address for br-1df92a5d7551: 172.18.0.1
  Memory usage:   3%        IPv4 address for br-646df251478e: 172.19.0.1
  Swap usage:     0%        IPv4 address for docker0:         172.17.0.1
  Temperature:    50.0 C    IPv6 address for docker0:         fd00:dead:beef::1
  Processes:      214       IPv4 address for eno1:            10.147.8.12

 * Introducing self-healing high availability clustering for MicroK8s!
   Super simple, hardened and opinionated Kubernetes for production.

     https://microk8s.io/high-availability

0 updates can be installed immediately.
0 of these updates are security updates.

Last login: Tue Nov 10 13:39:10 2020 from 127.0.0.1
Actions #2

Updated by Michael Fritch over 3 years ago

  • Status changed from New to Need More Info

These are set using the generic ceph key/value store (ceph config-key), which would require a restart of the mgr. It appears the current cephadm bootstrap logic sets the ssh_config after the mgr has been restarted.

The steps documented here could likely be improved:
https://docs.ceph.com/en/latest/cephadm/operations/#default-behavior

Does the following work as a workaround?

# ceph orch restart mgr
# ceph orch host add nuc2

Actions #3

Updated by Thilo-Alexander Ginkel over 3 years ago

Thanks for your reply! Unfortunately, the workaround does not seem to work:

# ceph orch restart mgr

# ceph orch host add nuc2
Error EINVAL: Failed to connect to nuc2 (nuc2).
Check that the host is reachable and accepts connections using the cephadm SSH key

you may want to run:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > key
> ssh -F ssh_config -i key root@nuc2

ceph status shows stray nodes, could it be that the failed operation already added nuc2, which can't be added a second time?

# ceph status
  cluster:
    id:     beb4604e-2387-11eb-af09-a56995e69c1d
    health: HEALTH_WARN
            2 stray daemons(s) not managed by cephadm
            1 stray host(s) with 2 daemon(s) not managed by cephadm
            Reduced data availability: 1 pg inactive
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum nuc2 (age 71m)
    mgr: nuc2.eiuxdl(active, since 70m)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             1 unknown

If you have an experimental patch for cephadm, I could also give it a try...

Actions #4

Updated by Thilo-Alexander Ginkel over 3 years ago

I tried including the restart in cephadm after setting the ssh config, but that also does not seem to have any effect:

--- cephadm     2020-11-10 21:50:30.979492804 +0100
+++ cephadm.patched     2020-11-10 21:49:18.263644675 +0100
@@ -3108,6 +3108,7 @@
                 pathify(args.ssh_config.name): '/tmp/cephadm-ssh-config:z',
             }
             cli(['cephadm', 'set-ssh-config', '-i', '/tmp/cephadm-ssh-config'], extra_mounts=mounts)
+            cli(['orch', 'restart', 'mgr'])

         if args.ssh_private_key and args.ssh_public_key:
             logger.info('Using provided ssh keys...')
Actions #5

Updated by Michael Fritch over 3 years ago

eh sorry, `ceph orch restart mgr` happens async, so issuing a `host add` immediately after won't work without some kind of poll/sleep

For a quick test, something like this might do better:

# cephadm unit --name mgr.nuc2.eiuxdl restart

I think a patch for this will come down to moving this logic:

https://github.com/ceph/ceph/blob/31d4e76d0d584790763b4b1146b29ea4cfc2e9af/src/cephadm/cephadm#L3096-L3103

.. to somewhere after we have set all the related ssh config that requires a mgr restart ..

Actions #6

Updated by Michael Fritch over 3 years ago

maybe not entirely ideal, but I think this will work:
https://github.com/ceph/ceph/commit/193a1b3fc2faee50ccc87f78e4bd28498f4264ef

mind giving it a quick try?

Actions #7

Updated by Thilo-Alexander Ginkel over 3 years ago

Still no luck... Removed all ceph containers, removed /var/lib/ceph and /etc/ceph, then:

# curl --silent --remote-name --location https://raw.githubusercontent.com/ceph/ceph/193a1b3fc2faee50ccc87f78e4bd28498f4264ef/src/cephadm/cephadm

# chmod +x cephadm

# ./cephadm bootstrap --ssh-config ssh_config --mon-ip 10.147.8.12                                                                               
This is a development version of cephadm.
For information regarding the latest stable release:
    https://docs.ceph.com/docs/octopus/cephadm/install
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit systemd-timesyncd.service is enabled and running
Repeating the final host check...
podman|docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit systemd-timesyncd.service is enabled and running
Host looks OK
Cluster fsid: d32d00f4-243f-11eb-af09-a56995e69c1d
Verifying IP 10.147.8.12 port 3300 ...
Verifying IP 10.147.8.12 port 6789 ...
Mon IP 10.147.8.12 is in CIDR network 10.147.8.0/24
Pulling container image docker.io/ceph/daemon-base:latest-master-devel...
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network...
Creating mgr...
Verifying port 9283 ...
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Wrote config to /etc/ceph/ceph.conf
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/10)...
mgr not available, waiting (2/10)...
mgr not available, waiting (3/10)...
mgr not available, waiting (4/10)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Using provided ssh config...
Generating ssh key...
Wrote public SSH key to to /etc/ceph/ceph.pub
Adding key to root@localhost's authorized_keys...
Restarting cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 10...
mgr epoch 10 is available
Adding host nuc2...
Non-zero exit code 22 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=nuc2 -v /var/log/ceph/d32d00f4-243f-11eb-af09-a56995e69c1d:/var/log/ceph:z -v /tmp/ceph-tmpbkjw3wjn:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpp6qcpwxb:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel orch host add nuc2
/usr/bin/ceph:stderr Error EINVAL: Failed to connect to nuc2 (nuc2).
/usr/bin/ceph:stderr Please make sure that the host is reachable and accepts connections using the cephadm SSH key
/usr/bin/ceph:stderr 
/usr/bin/ceph:stderr To add the cephadm SSH key to the host:
/usr/bin/ceph:stderr > ceph cephadm get-pub-key > ~/ceph.pub
/usr/bin/ceph:stderr > ssh-copy-id -f -i ~/ceph.pub root@nuc2
/usr/bin/ceph:stderr 
/usr/bin/ceph:stderr To check that the host is reachable:
/usr/bin/ceph:stderr > ceph cephadm get-ssh-config > ssh_config
/usr/bin/ceph:stderr > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
/usr/bin/ceph:stderr > ssh -F ssh_config -i ~/cephadm_private_key root@nuc2
ERROR: Failed to add host <nuc2>: Failed command: /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=nuc2 -v /var/log/ceph/d32d00f4-243f-11eb-af09-a56995e69c1d:/var/log/ceph:z -v /tmp/ceph-tmpbkjw3wjn:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpp6qcpwxb:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel orch host add nuc2
Actions #8

Updated by Michael Fritch over 3 years ago

finally had a chance to test this and the errors reported from the underlying remoto lib are less than helpful:

Nov 11 22:26:05 node1 conmon[811925]: debug 2020-11-11T21:26:05.796+0000 7fef60ace700  0 log_channel(audit) log [DBG] : from='client.14272 -' entity='client.admin' cmd=[{"prefix": "orch host add", "hostname": "node1", "target": ["mon-mgr", ""]}]: dispatch
Nov 11 22:26:05 node1 conmon[811925]: [31B blob data]
Nov 11 22:26:05 node1 conmon[811925]: debug 2020-11-11T21:26:05.872+0000 7fef6bf74700  0 [cephadm ERROR orchestrator._interface] _Promise failed
Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last):
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 48, in bootstrap_exec
Nov 11 22:26:05 node1 conmon[811925]:     s = io.read(1)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read
Nov 11 22:26:05 node1 conmon[811925]:     raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf)))
Nov 11 22:26:05 node1 conmon[811925]: EOFError: expected 1 bytes, got 0
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: During handling of the above exception, another exception occurred:
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last):
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 998, in _remote_connection
Nov 11 22:26:05 node1 conmon[811925]:     conn, connr = self._get_connection(addr)
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 961, in _get_connection
Nov 11 22:26:05 node1 conmon[811925]:     sudo=True if self.ssh_user != 'root' else False)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__
Nov 11 22:26:05 node1 conmon[811925]:     self.gateway = self._make_gateway(hostname)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 44, in _make_gateway
Nov 11 22:26:05 node1 conmon[811925]:     self._make_connection_string(hostname)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in makegateway
Nov 11 22:26:05 node1 conmon[811925]:     gw = gateway_bootstrap.bootstrap(io, spec)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, in bootstrap
Nov 11 22:26:05 node1 conmon[811925]:     bootstrap_exec(io, spec)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 53, in bootstrap_exec
Nov 11 22:26:05 node1 conmon[811925]:     raise HostNotFound(io.remoteaddress)
Nov 11 22:26:05 node1 conmon[811925]: execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-lqkx72b8 -i /tmp/cephadm-identity-rx03t2fk root@node1
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: The above exception was the direct cause of the following exception:
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last):
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 295, in _finalize
Nov 11 22:26:05 node1 conmon[811925]:     next_result = self._on_complete(self._value)
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 108, in <lambda>
Nov 11 22:26:05 node1 conmon[811925]:     return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1190, in add_host
Nov 11 22:26:05 node1 conmon[811925]:     return self._add_host(spec)
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1176, in _add_host
Nov 11 22:26:05 node1 conmon[811925]:     error_ok=True, no_fsid=True)
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1079, in _run_cephadm
Nov 11 22:26:05 node1 conmon[811925]:     with self._remote_connection(host, addr) as tpl:
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib64/python3.6/contextlib.py", line 81, in __enter__
Nov 11 22:26:05 node1 conmon[811925]:     return next(self.gen)
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1025, in _remote_connection
Nov 11 22:26:05 node1 conmon[811925]:     raise OrchestratorError(msg) from e
Nov 11 22:26:05 node1 conmon[811925]: orchestrator._interface.OrchestratorError: Failed to connect to node1 (node1).
Nov 11 22:26:05 node1 conmon[811925]: Please make sure that the host is reachable and accepts connections using the cephadm SSH key
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: To add the cephadm SSH key to the host:
Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-pub-key > ~/ceph.pub
Nov 11 22:26:05 node1 conmon[811925]: > ssh-copy-id -f -i ~/ceph.pub root@node1
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: To check that the host is reachable:
Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-ssh-config > ssh_config
Nov 11 22:26:05 node1 conmon[811925]: > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
Nov 11 22:26:05 node1 conmon[811925]: > ssh -F ssh_config -i ~/cephadm_private_key root@node1
Nov 11 22:26:05 node1 conmon[811925]: debug 2020-11-11T21:26:05.872+0000 7fef6bf74700 -1 log_channel(cephadm) log [ERR] : _Promise failed
Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last):
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 48, in bootstrap_exec
Nov 11 22:26:05 node1 conmon[811925]:     s = io.read(1)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read
Nov 11 22:26:05 node1 conmon[811925]:     raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf)))
Nov 11 22:26:05 node1 conmon[811925]: EOFError: expected 1 bytes, got 0
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: During handling of the above exception, another exception occurred:
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last):
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 998, in _remote_connection
Nov 11 22:26:05 node1 conmon[811925]:     conn, connr = self._get_connection(addr)
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 961, in _get_connection
Nov 11 22:26:05 node1 conmon[811925]:     sudo=True if self.ssh_user != 'root' else False)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__
Nov 11 22:26:05 node1 conmon[811925]:     self.gateway = self._make_gateway(hostname)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 44, in _make_gateway
Nov 11 22:26:05 node1 conmon[811925]:     self._make_connection_string(hostname)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in makegateway
Nov 11 22:26:05 node1 conmon[811925]:     gw = gateway_bootstrap.bootstrap(io, spec)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, in bootstrap
Nov 11 22:26:05 node1 conmon[811925]:     bootstrap_exec(io, spec)
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 53, in bootstrap_exec
Nov 11 22:26:05 node1 conmon[811925]:     raise HostNotFound(io.remoteaddress)
Nov 11 22:26:05 node1 conmon[811925]: execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-lqkx72b8 -i /tmp/cephadm-identity-rx03t2fk root@node1
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: The above exception was the direct cause of the following exception:
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: Traceback (most recent call last):
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 295, in _finalize
Nov 11 22:26:05 node1 conmon[811925]:     next_result = self._on_complete(self._value)
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 108, in <lambda>
Nov 11 22:26:05 node1 conmon[811925]:     return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1190, in add_host
Nov 11 22:26:05 node1 conmon[811925]:     return self._add_host(spec)
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1176, in _add_host
Nov 11 22:26:05 node1 conmon[811925]:     error_ok=True, no_fsid=True)
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1079, in _run_cephadm
Nov 11 22:26:05 node1 conmon[811925]:     with self._remote_connection(host, addr) as tpl:
Nov 11 22:26:05 node1 conmon[811925]:   File "/lib64/python3.6/contextlib.py", line 81, in __enter__
Nov 11 22:26:05 node1 conmon[811925]:     return next(self.gen)
Nov 11 22:26:05 node1 conmon[811925]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1025, in _remote_connection
Nov 11 22:26:05 node1 conmon[811925]:     raise OrchestratorError(msg) from e
Nov 11 22:26:05 node1 conmon[811925]: orchestrator._interface.OrchestratorError: Failed to connect to node1 (node1).
Nov 11 22:26:05 node1 conmon[811925]: Please make sure that the host is reachable and accepts connections using the cephadm SSH key
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: To add the cephadm SSH key to the host:
Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-pub-key > ~/ceph.pub
Nov 11 22:26:05 node1 conmon[811925]: > ssh-copy-id -f -i ~/ceph.pub root@node1
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: To check that the host is reachable:
Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-ssh-config > ssh_config
Nov 11 22:26:05 node1 conmon[811925]: > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
Nov 11 22:26:05 node1 conmon[811925]: > ssh -F ssh_config -i ~/cephadm_private_key root@node1
Nov 11 22:26:05 node1 conmon[811925]: debug 2020-11-11T21:26:05.876+0000 7fef6bf74700 -1 mgr.server reply reply (22) Invalid argument Failed to connect to node1 (node1).
Nov 11 22:26:05 node1 conmon[811925]: Please make sure that the host is reachable and accepts connections using the cephadm SSH key
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: To add the cephadm SSH key to the host:
Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-pub-key > ~/ceph.pub
Nov 11 22:26:05 node1 conmon[811925]: > ssh-copy-id -f -i ~/ceph.pub root@node1
Nov 11 22:26:05 node1 conmon[811925]: 
Nov 11 22:26:05 node1 conmon[811925]: To check that the host is reachable:
Nov 11 22:26:05 node1 conmon[811925]: > ceph cephadm get-ssh-config > ssh_config
Nov 11 22:26:05 node1 conmon[811925]: > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
Nov 11 22:26:05 node1 conmon[811925]: > ssh -F ssh_config -i ~/cephadm_private_key root@node1
Nov 11 22:26:07 node1 conmon[811925]: debug 

Actions #9

Updated by Michael Fritch over 3 years ago

cephadm will use a default ssh_config as follows:

node1:~ # ceph cephadm get-ssh-config
Host *
  User root
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null
  ConnectTimeout=30

Adding `StrictHostKeyChecking no` appears to resolve this:

node1:~ # cat ssh_config
SendEnv LANG LC_* GIT_*
Host *
  StrictHostKeyChecking no
  Port                  2222

node1:~ # ceph cephadm set-ssh-config -i ssh_config

node1:~ # ceph orch host add node1
Added host 'node1'

Maybe it's sufficient to check for the existence of `StrictHostKeyChecking` in the user defined ssh_config and raise and error msg if it is not defined?

Actions #10

Updated by Thilo-Alexander Ginkel over 3 years ago

Problem solved. Thanks a lot for your support! :-)

Actions #11

Updated by Michael Fritch over 3 years ago

  • Status changed from Need More Info to Fix Under Review
  • Assignee set to Michael Fritch
  • Pull request ID set to 38052
Actions #12

Updated by Michael Fritch over 3 years ago

Thilo-Alexander Ginkel wrote:

Problem solved. Thanks a lot for your support! :-)

Thanks for the bug report! :)

Added this to validate the ssh config input:
https://github.com/ceph/ceph/pull/38052

StrictHostKeyChecking must be provided via the user-defined ssh config:

# cat ssh_config
SendEnv LANG LC_* GIT_*
Host *
  Port                  2222

# ceph cephadm set-ssh-config -i ssh_config
Error EINVAL: ssh_config requires StrictHostKeyChecking

StrictHostKeyChecking should be set to either yes/no:

# cat ssh_config
SendEnv LANG LC_* GIT_*
Host *
  StrictHostKeyChecking ask
  Port                  2222

# ceph cephadm set-ssh-config -i ssh_config
Error EINVAL: ssh_config cannot contain: 'StrictHostKeyChecking ask'
Actions #13

Updated by Sebastian Wagner over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #14

Updated by Sebastian Wagner over 3 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF