Project

General

Profile

Actions

Bug #51665

closed

document unforunate interactions between cephadm and restrictive sshd_config?

Added by Tim Serong almost 3 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Low
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This one is a little obscure, so please bear with me.

If you deploy ceph using ceph-salt, it will invoke cephadm bootstrap [...] --ssh-user cephadm, i.e. it's setting a non-root user for ssh access. That's fine, unless you happen to also have a restrictive /etc/ssh/sshd_config, e.g.: AllowUsers or AllowGroups is specified, and doesn't mention that user/group, in which case ssh access wont't work, and it's not immediately obvious what the problem is.

I don't expect the general ceph docs to cover ceph-salt and its choice of user, but I went looking through the docs looking for mention of cephadm's --ssh-user option, and only found it in the manpage, and also on https://docs.ceph.com/en/latest/cephadm/install/, which says "The --ssh-user <user> option makes it possible to choose which ssh user cephadm will use to connect to hosts. The associated ssh key will be added to /home/*<user>*/.ssh/authorized_keys. The user that you designate with this option must have passwordless sudo access."

Should we elaborate on this further? Add a tip along the lines of "if you're using a non-root user, make sure your ssh config allows them access"? Or is the blanket "The user that you designate with this option must have passwordless sudo access" sufficient?


Related issues 1 (0 open1 closed)

Related to Orchestrator - Feature #55493: Detect ssh connectivity issues ASAPResolvedRedouane Kachach Elhichou

Actions
Actions #1

Updated by Varsha Rao almost 3 years ago

  • Tags set to low-hanging-fruit
Actions #2

Updated by Deepika Upadhyay almost 3 years ago

  • Translation missing: en.field_tag_list set to good-first-issue, low-hanging-fruit
Actions #3

Updated by Tim Serong almost 3 years ago

Let's set aside my earlier mention of ceph-salt for a moment, and focus only on the use of cephadm bootstrap --ssh-user SSH_USER (which is what ceph-salt uses internally anyway).

The key thing here is that the user specified "must have passwordless sudo access". If that access isn't available, we can see a couple of different types of failure.

1) The user can't log in via ssh, due to restrictive sshd_config. For example, say we have a user named cephadm in group cephadm (which is what the cephadm package itself will create, when it's installed), but that /etc/ssh/sshd_config has been set to specify AllowGroups root. In this case, the cephadm user can't login via ssh, because it's not in one of the allowed groups. Running cephadm bootstrap will fail with something like the following output:

# cephadm bootstrap --mon-ip 10.20.162.200 --ssh-user cephadm
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service', 'ntpsec.service']
Installing packages ['chrony']...
Enabling unit chronyd.service
Enabling unit systemd-timesyncd.service
No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service', 'ntpsec.service']
Repeating the final host check...
podman|docker (/usr/bin/podman) is present
systemctl is present
lvcreate is present
Unit systemd-timesyncd.service is enabled and running
Host looks OK
Cluster fsid: 71347cec-e618-11eb-90e9-525400bd8f37
Verifying IP 10.20.162.200 port 3300 ...
Verifying IP 10.20.162.200 port 6789 ...
Mon IP 10.20.162.200 is in CIDR network 10.20.162.0/24
Pulling container image registry.suse.com/ses/7/ceph/ceph:latest...
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network...
Creating mgr...
Verifying port 9283 ...
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Wrote config to /etc/ceph/ceph.conf
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/10)...
mgr not available, waiting (2/10)...
mgr not available, waiting (3/10)...
mgr not available, waiting (4/10)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for Mgr epoch 5...
Mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to to /etc/ceph/ceph.pub
Adding key to cephadm@localhost's authorized_keys...
Adding host master...
Non-zero exit code 22 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=registry.suse.com/ses/7/ceph/ceph:latest -e NODE_NAME=master -v /var/log/ceph/71347cec-e618-11eb-90e9-525400bd8f37:/var/log/ceph:z -v /tmp/ceph-tmpv97ek_ks:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp4lv64roe:/etc/ceph/ceph.conf:z registry.suse.com/ses/7/ceph/ceph:latest orch host add master
/usr/bin/ceph: stderr Error EINVAL: Failed to connect to master (master).
/usr/bin/ceph: stderr Please make sure that the host is reachable and accepts connections using the cephadm SSH key
/usr/bin/ceph: stderr 
/usr/bin/ceph: stderr To add the cephadm SSH key to the host:
/usr/bin/ceph: stderr > ceph cephadm get-pub-key > ~/ceph.pub
/usr/bin/ceph: stderr > ssh-copy-id -f -i ~/ceph.pub cephadm@master
/usr/bin/ceph: stderr 
/usr/bin/ceph: stderr To check that the host is reachable:
/usr/bin/ceph: stderr > ceph cephadm get-ssh-config > ssh_config
/usr/bin/ceph: stderr > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
/usr/bin/ceph: stderr > chmod 0600 ~/cephadm_private_key
/usr/bin/ceph: stderr > ssh -F ssh_config -i ~/cephadm_private_key cephadm@master
ERROR: Failed to add host <master>: Failed command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=registry.suse.com/ses/7/ceph/ceph:latest -e NODE_NAME=master -v /var/log/ceph/71347cec-e618-11eb-90e9-525400bd8f37:/var/log/ceph:z -v /tmp/ceph-tmpv97ek_ks:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp4lv64roe:/etc/ceph/ceph.conf:z registry.suse.com/ses/7/ceph/ceph:latest orch host add master

journalctl -u sshd|grep cephadm will show something like:

Jul 16 11:33:05 master sshd[2204]: User cephadm from 127.0.0.1 not allowed because none of user's groups are listed in AllowGroups
Jul 16 11:33:05 master sshd[2204]: Postponed keyboard-interactive for invalid user cephadm from 127.0.0.1 port 49096 ssh2 [preauth]
Jul 16 11:33:05 master sshd[2208]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=127.0.0.1  user=cephadm

So you can figure out what the problem is, I'm just not sure it's terribly obvious from the error messages. The solution is to ensure you've enabled ssh access for that user (e.g. by adding the cephadm group to AllowGroups in /etc/ssh/sshd_config in this example). This is why I was originally thinking maybe we should make more noise in the docs about ensuring the ssh config is correct when using a non-root user.

2) The user can log in via ssh, but doesn't have passwordless sudo access. In this case you'll see something like:

# cephadm bootstrap --mon-ip 10.20.154.200 --ssh-user cephadm
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service', 'ntpsec.service']
Installing packages ['chrony']...
Enabling unit chronyd.service
Enabling unit systemd-timesyncd.service
No time sync service is running; checked for ['chrony.service', 'chronyd.service', 'systemd-timesyncd.service', 'ntpd.service', 'ntp.service', 'ntpsec.service']
Repeating the final host check...
podman|docker (/usr/bin/podman) is present
systemctl is present
lvcreate is present
Unit systemd-timesyncd.service is enabled and running
Host looks OK
Cluster fsid: e486e20a-e61a-11eb-ac5c-52540004d8e2
Verifying IP 10.20.154.200 port 3300 ...
Verifying IP 10.20.154.200 port 6789 ...
Mon IP 10.20.154.200 is in CIDR network 10.20.154.0/24
Pulling container image registry.suse.com/ses/7/ceph/ceph:latest...
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network...
Creating mgr...
Verifying port 9283 ...
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Wrote config to /etc/ceph/ceph.conf
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/10)...
mgr not available, waiting (2/10)...
mgr not available, waiting (3/10)...
mgr not available, waiting (4/10)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for Mgr epoch 5...
Mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to to /etc/ceph/ceph.pub
Adding key to cephadm@localhost's authorized_keys...
Adding host master...
Non-zero exit code 22 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=registry.suse.com/ses/7/ceph/ceph:latest -e NODE_NAME=master -v /var/log/ceph/e486e20a-e61a-11eb-ac5c-52540004d8e2:/var/log/ceph:z -v /tmp/ceph-tmpdwyfaccf:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp5w8145tl:/etc/ceph/ceph.conf:z registry.suse.com/ses/7/ceph/ceph:latest orch host add master
/usr/bin/ceph: stderr Error EINVAL: Can't communicate with remote host `master`, possibly because python3 is not installed there: cannot send (already closed?)
ERROR: Failed to add host <master>: Failed command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=registry.suse.com/ses/7/ceph/ceph:latest -e NODE_NAME=master -v /var/log/ceph/e486e20a-e61a-11eb-ac5c-52540004d8e2:/var/log/ceph:z -v /tmp/ceph-tmpdwyfaccf:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp5w8145tl:/etc/ceph/ceph.conf:z registry.suse.com/ses/7/ceph/ceph:latest orch host add master

The messaging here is even worse - "stderr Error EINVAL: Can't communicate with remote host `master`, possibly because python3 is not installed there: cannot send (already closed?)". python3 is definitely installed. The problem is that my sudoers config is the default (on SUSE, at least), which actually prompts for a password at this point, so if I dig around in journalctl I'll see something like this:

Jul 16 14:28:33 master conmon[32716]: Administrator. It usually boils down to these three things:
Jul 16 14:28:33 master conmon[32716]: 
Jul 16 14:28:33 master conmon[32716]:     #1) Respect the privacy of others.
Jul 16 14:28:33 master conmon[32716]:     #2) Think before you type.
Jul 16 14:28:33 master conmon[32716]:     #3) With great power comes great responsibility.
Jul 16 14:28:33 master conmon[32716]: 
Jul 16 14:28:33 master conmon[32716]: sudo: no tty present and no askpass program specified

The trivial (and presumably way-too-loose-security-wise) solution here is to add cephadm ALL=(ALL) NOPASSWD: ALL to /etc/sudoers, and this is where we go down a bit of a rabbit hole I hadn't anticipated when I opened this bug. It turns out that the cephadm package previously actually created a /etc/sudoers.d/cephadm file, but it looks like that file had completely the wrong content (see https://tracker.ceph.com/issues/47112), and was since removed. There's a better version (i.e. one that's actually known to work, at least on SUSE distros), in ceph-salt (see https://github.com/ceph/ceph-salt/blob/master/ceph-salt-formula/salt/ceph-salt/common/sshkey.sls#L23-L32). It may be worth taking that template and adding it back to the cephadm package as part of ceph itself, or at least documenting what needs to go into the sudoers file to give a non-root user sufficient access.

FWIW, I still think this makes a good first issue / low hanging fruit... Even if there may be a bit of a rabbit hole to go down, at least it's a narrow rabbit hole ;-)

Actions #4

Updated by Sebastian Wagner over 2 years ago

  • Priority changed from Normal to Low
Actions #5

Updated by Sebastian Wagner about 2 years ago

  • Tags deleted (low-hanging-fruit)
Actions #6

Updated by Redouane Kachach Elhichou almost 2 years ago

  • Related to Feature #55493: Detect ssh connectivity issues ASAP added
Actions #7

Updated by Redouane Kachach Elhichou almost 2 years ago

  • Translation missing: en.field_tag_list deleted (good-first-issue, low-hanging-fruit)
Actions #8

Updated by Redouane Kachach Elhichou almost 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Redouane Kachach Elhichou
  • Pull request ID set to 46129
Actions #9

Updated by Redouane Kachach Elhichou almost 2 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF