Bug #51665
closeddocument unforunate interactions between cephadm and restrictive sshd_config?
0%
Description
This one is a little obscure, so please bear with me.
If you deploy ceph using ceph-salt, it will invoke cephadm bootstrap [...] --ssh-user cephadm
, i.e. it's setting a non-root user for ssh access. That's fine, unless you happen to also have a restrictive /etc/ssh/sshd_config, e.g.: AllowUsers or AllowGroups is specified, and doesn't mention that user/group, in which case ssh access wont't work, and it's not immediately obvious what the problem is.
I don't expect the general ceph docs to cover ceph-salt and its choice of user, but I went looking through the docs looking for mention of cephadm's --ssh-user option, and only found it in the manpage, and also on https://docs.ceph.com/en/latest/cephadm/install/, which says "The --ssh-user <user> option makes it possible to choose which ssh user cephadm will use to connect to hosts. The associated ssh key will be added to /home/*<user>*/.ssh/authorized_keys. The user that you designate with this option must have passwordless sudo access."
Should we elaborate on this further? Add a tip along the lines of "if you're using a non-root user, make sure your ssh config allows them access"? Or is the blanket "The user that you designate with this option must have passwordless sudo access" sufficient?
Updated by Deepika Upadhyay almost 3 years ago
- Translation missing: en.field_tag_list set to good-first-issue, low-hanging-fruit
Updated by Tim Serong almost 3 years ago
Let's set aside my earlier mention of ceph-salt for a moment, and focus only on the use of cephadm bootstrap --ssh-user SSH_USER
(which is what ceph-salt uses internally anyway).
The key thing here is that the user specified "must have passwordless sudo access". If that access isn't available, we can see a couple of different types of failure.
1) The user can't log in via ssh, due to restrictive sshd_config. For example, say we have a user named cephadm
in group cephadm
(which is what the cephadm package itself will create, when it's installed), but that /etc/ssh/sshd_config
has been set to specify AllowGroups root
. In this case, the cephadm user can't login via ssh, because it's not in one of the allowed groups. Running cephadm bootstrap
will fail with something like the following output:
# cephadm bootstrap --mon-ip 10.20.162.200 --ssh-user cephadm Verifying podman|docker is present... Verifying lvm2 is present... Verifying time synchronization is in place... No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service', 'ntpsec.service'] Installing packages ['chrony']... Enabling unit chronyd.service Enabling unit systemd-timesyncd.service No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service', 'ntpsec.service'] Repeating the final host check... podman|docker (/usr/bin/podman) is present systemctl is present lvcreate is present Unit systemd-timesyncd.service is enabled and running Host looks OK Cluster fsid: 71347cec-e618-11eb-90e9-525400bd8f37 Verifying IP 10.20.162.200 port 3300 ... Verifying IP 10.20.162.200 port 6789 ... Mon IP 10.20.162.200 is in CIDR network 10.20.162.0/24 Pulling container image registry.suse.com/ses/7/ceph/ceph:latest... Extracting ceph user uid/gid from container image... Creating initial keys... Creating initial monmap... Creating mon... Waiting for mon to start... Waiting for mon... mon is available Assimilating anything we can from ceph.conf... Generating new minimal ceph.conf... Restarting the monitor... Setting mon public_network... Creating mgr... Verifying port 9283 ... Wrote keyring to /etc/ceph/ceph.client.admin.keyring Wrote config to /etc/ceph/ceph.conf Waiting for mgr to start... Waiting for mgr... mgr not available, waiting (1/10)... mgr not available, waiting (2/10)... mgr not available, waiting (3/10)... mgr not available, waiting (4/10)... mgr is available Enabling cephadm module... Waiting for the mgr to restart... Waiting for Mgr epoch 5... Mgr epoch 5 is available Setting orchestrator backend to cephadm... Generating ssh key... Wrote public SSH key to to /etc/ceph/ceph.pub Adding key to cephadm@localhost's authorized_keys... Adding host master... Non-zero exit code 22 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=registry.suse.com/ses/7/ceph/ceph:latest -e NODE_NAME=master -v /var/log/ceph/71347cec-e618-11eb-90e9-525400bd8f37:/var/log/ceph:z -v /tmp/ceph-tmpv97ek_ks:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp4lv64roe:/etc/ceph/ceph.conf:z registry.suse.com/ses/7/ceph/ceph:latest orch host add master /usr/bin/ceph: stderr Error EINVAL: Failed to connect to master (master). /usr/bin/ceph: stderr Please make sure that the host is reachable and accepts connections using the cephadm SSH key /usr/bin/ceph: stderr /usr/bin/ceph: stderr To add the cephadm SSH key to the host: /usr/bin/ceph: stderr > ceph cephadm get-pub-key > ~/ceph.pub /usr/bin/ceph: stderr > ssh-copy-id -f -i ~/ceph.pub cephadm@master /usr/bin/ceph: stderr /usr/bin/ceph: stderr To check that the host is reachable: /usr/bin/ceph: stderr > ceph cephadm get-ssh-config > ssh_config /usr/bin/ceph: stderr > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key /usr/bin/ceph: stderr > chmod 0600 ~/cephadm_private_key /usr/bin/ceph: stderr > ssh -F ssh_config -i ~/cephadm_private_key cephadm@master ERROR: Failed to add host <master>: Failed command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=registry.suse.com/ses/7/ceph/ceph:latest -e NODE_NAME=master -v /var/log/ceph/71347cec-e618-11eb-90e9-525400bd8f37:/var/log/ceph:z -v /tmp/ceph-tmpv97ek_ks:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp4lv64roe:/etc/ceph/ceph.conf:z registry.suse.com/ses/7/ceph/ceph:latest orch host add master
journalctl -u sshd|grep cephadm
will show something like:
Jul 16 11:33:05 master sshd[2204]: User cephadm from 127.0.0.1 not allowed because none of user's groups are listed in AllowGroups Jul 16 11:33:05 master sshd[2204]: Postponed keyboard-interactive for invalid user cephadm from 127.0.0.1 port 49096 ssh2 [preauth] Jul 16 11:33:05 master sshd[2208]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=127.0.0.1 user=cephadm
So you can figure out what the problem is, I'm just not sure it's terribly obvious from the error messages. The solution is to ensure you've enabled ssh access for that user (e.g. by adding the cephadm
group to AllowGroups
in /etc/ssh/sshd_config
in this example). This is why I was originally thinking maybe we should make more noise in the docs about ensuring the ssh config is correct when using a non-root user.
2) The user can log in via ssh, but doesn't have passwordless sudo access. In this case you'll see something like:
# cephadm bootstrap --mon-ip 10.20.154.200 --ssh-user cephadm Verifying podman|docker is present... Verifying lvm2 is present... Verifying time synchronization is in place... No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service', 'ntpsec.service'] Installing packages ['chrony']... Enabling unit chronyd.service Enabling unit systemd-timesyncd.service No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service', 'ntpsec.service'] Repeating the final host check... podman|docker (/usr/bin/podman) is present systemctl is present lvcreate is present Unit systemd-timesyncd.service is enabled and running Host looks OK Cluster fsid: e486e20a-e61a-11eb-ac5c-52540004d8e2 Verifying IP 10.20.154.200 port 3300 ... Verifying IP 10.20.154.200 port 6789 ... Mon IP 10.20.154.200 is in CIDR network 10.20.154.0/24 Pulling container image registry.suse.com/ses/7/ceph/ceph:latest... Extracting ceph user uid/gid from container image... Creating initial keys... Creating initial monmap... Creating mon... Waiting for mon to start... Waiting for mon... mon is available Assimilating anything we can from ceph.conf... Generating new minimal ceph.conf... Restarting the monitor... Setting mon public_network... Creating mgr... Verifying port 9283 ... Wrote keyring to /etc/ceph/ceph.client.admin.keyring Wrote config to /etc/ceph/ceph.conf Waiting for mgr to start... Waiting for mgr... mgr not available, waiting (1/10)... mgr not available, waiting (2/10)... mgr not available, waiting (3/10)... mgr not available, waiting (4/10)... mgr is available Enabling cephadm module... Waiting for the mgr to restart... Waiting for Mgr epoch 5... Mgr epoch 5 is available Setting orchestrator backend to cephadm... Generating ssh key... Wrote public SSH key to to /etc/ceph/ceph.pub Adding key to cephadm@localhost's authorized_keys... Adding host master... Non-zero exit code 22 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=registry.suse.com/ses/7/ceph/ceph:latest -e NODE_NAME=master -v /var/log/ceph/e486e20a-e61a-11eb-ac5c-52540004d8e2:/var/log/ceph:z -v /tmp/ceph-tmpdwyfaccf:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp5w8145tl:/etc/ceph/ceph.conf:z registry.suse.com/ses/7/ceph/ceph:latest orch host add master /usr/bin/ceph: stderr Error EINVAL: Can't communicate with remote host `master`, possibly because python3 is not installed there: cannot send (already closed?) ERROR: Failed to add host <master>: Failed command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=registry.suse.com/ses/7/ceph/ceph:latest -e NODE_NAME=master -v /var/log/ceph/e486e20a-e61a-11eb-ac5c-52540004d8e2:/var/log/ceph:z -v /tmp/ceph-tmpdwyfaccf:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp5w8145tl:/etc/ceph/ceph.conf:z registry.suse.com/ses/7/ceph/ceph:latest orch host add master
The messaging here is even worse - "stderr Error EINVAL: Can't communicate with remote host `master`, possibly because python3 is not installed there: cannot send (already closed?)". python3 is definitely installed. The problem is that my sudoers config is the default (on SUSE, at least), which actually prompts for a password at this point, so if I dig around in journalctl
I'll see something like this:
Jul 16 14:28:33 master conmon[32716]: Administrator. It usually boils down to these three things: Jul 16 14:28:33 master conmon[32716]: Jul 16 14:28:33 master conmon[32716]: #1) Respect the privacy of others. Jul 16 14:28:33 master conmon[32716]: #2) Think before you type. Jul 16 14:28:33 master conmon[32716]: #3) With great power comes great responsibility. Jul 16 14:28:33 master conmon[32716]: Jul 16 14:28:33 master conmon[32716]: sudo: no tty present and no askpass program specified
The trivial (and presumably way-too-loose-security-wise) solution here is to add cephadm ALL=(ALL) NOPASSWD: ALL
to /etc/sudoers
, and this is where we go down a bit of a rabbit hole I hadn't anticipated when I opened this bug. It turns out that the cephadm package previously actually created a /etc/sudoers.d/cephadm
file, but it looks like that file had completely the wrong content (see https://tracker.ceph.com/issues/47112), and was since removed. There's a better version (i.e. one that's actually known to work, at least on SUSE distros), in ceph-salt (see https://github.com/ceph/ceph-salt/blob/master/ceph-salt-formula/salt/ceph-salt/common/sshkey.sls#L23-L32). It may be worth taking that template and adding it back to the cephadm package as part of ceph itself, or at least documenting what needs to go into the sudoers file to give a non-root user sufficient access.
FWIW, I still think this makes a good first issue / low hanging fruit... Even if there may be a bit of a rabbit hole to go down, at least it's a narrow rabbit hole ;-)
Updated by Sebastian Wagner over 2 years ago
- Priority changed from Normal to Low
Updated by Redouane Kachach Elhichou almost 2 years ago
- Related to Feature #55493: Detect ssh connectivity issues ASAP added
Updated by Redouane Kachach Elhichou almost 2 years ago
- Translation missing: en.field_tag_list deleted (
good-first-issue, low-hanging-fruit)
Updated by Redouane Kachach Elhichou almost 2 years ago
- Status changed from New to Fix Under Review
- Assignee set to Redouane Kachach Elhichou
- Pull request ID set to 46129
Updated by Redouane Kachach Elhichou almost 2 years ago
- Status changed from Fix Under Review to Resolved