Project

General

Profile

Actions

Feature #48392

open

ceph ignores --keyring?

Added by Arkadiy K over 3 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
low-hanging-fruit
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

I'm trying to set up a new OSD. I'm having some issues with the rollback not performing properly.
When "ceph-volume lvm prepare fails" it issues a "ceph osd purge-new" command which has a "--keyring" parameter leading to a valid file.
However, immediately after I see a "no such file or directory" error with a different list of files. Looks like the "--keyring" was ignored.

Here is the output:

root@pve2:/dev# ceph-volume lvm prepare --bluestore --data /dev/sda --block.db /dev/loop1
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e5e4e2e8-3705-4406-881c-e58b1223a6c3
Running command: /sbin/vgcreate --force --yes ceph-b48a4b34-5075-48fc-9dd5-c1aca872d741 /dev/sda
 stdout: Physical volume "/dev/sda" successfully created.
 stdout: Volume group "ceph-b48a4b34-5075-48fc-9dd5-c1aca872d741" successfully created
Running command: /sbin/lvcreate --yes -l 4291583 -n osd-block-e5e4e2e8-3705-4406-881c-e58b1223a6c3 ceph-b48a4b34-5075-48fc-9dd5-c1aca872d741
 stdout: Logical volume "osd-block-e5e4e2e8-3705-4406-881c-e58b1223a6c3" created.
--> blkid could not detect a PARTUUID for device: /dev/loop1
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
 stderr: 2020-11-29 17:49:52.497 7f9ef33f3700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2020-11-29 17:49:52.497 7f9ef33f3700 -1 AuthRegistry(0x7f9eec081d88) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
 stderr: purged osd.0
-->  RuntimeError: unable to use device
Actions #1

Updated by Brian Candler about 3 years ago

I see this with v15.2.10 as well.

The problem is at the rollback stage, specifically with osd purge-new where it apparently doesn't pick up --keyring. The simplest reproducer is this:

root@node2:~# /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.99999
2021-03-29T08:56:36.947+0000 7f747f867700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-03-29T08:56:36.947+0000 7f747f867700 -1 AuthRegistry(0x7f7478059880) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
Error EPERM: Are you SURE?  Did you verify with 'ceph osd safe-to-destroy'?  This will mean real, permanent data loss, as well as deletion of cephx and lockbox keys. Pass --yes-i-really-mean-it if you really do.
root@node2:~#

The actual way I triggered it was when I wrongly gave /dev/mapper/vg-lv instead of vg/lv as the device path:

root@node2:~# lvcreate --size 1G --name foo vg_ssd
  Logical volume "foo" created.
root@node2:~# ceph-volume lvm create --data /dev/mapper/vg_ssd-foo
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new caeb4a6e-b098-41d6-b190-3ae92d1a5f4c
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.3 --yes-i-really-mean-it
 stderr: 2021-03-28T11:43:42.822+0000 7fdacdf20700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc
/ceph/keyring.bin,: (2) No such file or directory
 stderr:
 stderr: 2021-03-28T11:43:42.822+0000 7fdacdf20700 -1 AuthRegistry(0x7fdac8059880) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ce
ph/keyring,/etc/ceph/keyring.bin,, disabling cephx
 stderr:
 stderr: purged osd.3
-->  RuntimeError: Cannot use device (/dev/mapper/vg_ssd-foo). A vg/lv path or an existing device is needed
root@node2:~#

(However I do see "purged osd.3" in the output, so I think the command did complete successfully despite the error)

You can suppress this error by creating a symlink for the keyring file:

root@node2:~# ln -s /var/lib/ceph/bootstrap-osd/ceph.keyring /etc/ceph/ceph.client.bootstrap-osd.keyring
root@node2:~# ceph-volume lvm create --data /dev/mapper/vg_ssd-foo
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new b63376d3-dcaa-4005-9b87-c67ec9ae7321
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.3 --yes-i-really-mean-it
 stderr: purged osd.3
-->  RuntimeError: Cannot use device (/dev/mapper/vg_ssd-foo). A vg/lv path or an existing device is needed
root@node2:~#

I note that purge-new was introduced in https://github.com/ceph/ceph/pull/23259/files

Oddly, running under strace (without the symlink workaround) shows that the --keyring file is successfully opened and read several times, in child processes or threads:

root@node2:~# strace -f /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.3 --yes-i-really-mean-it 2>&1 | egrep 'keyring|bootstrap'
execve("/usr/bin/ceph", ["/usr/bin/ceph", "--cluster", "ceph", "--name", "client.bootstrap-osd", "--keyring", "/var/lib/ceph/bootstrap-osd/ceph"..., "osd", "purge-new", "osd.3", "--yes-i-really-mean-it"], 0x7ffeac173d78 /* 12 vars */) = 0
[pid  4756] openat(AT_FDCWD, "/etc/ceph/ceph.client.bootstrap-osd.keyring", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4756] openat(AT_FDCWD, "/etc/ceph/ceph.keyring", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4756] openat(AT_FDCWD, "/etc/ceph/keyring", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4756] openat(AT_FDCWD, "/etc/ceph/keyring.bin", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4757] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 3
[pid  4757] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 3
[pid  4757] read(3, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 3
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 3
[pid  4758] read(3, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129
[pid  4761] sendmsg(12, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0012\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\242\343n\331", iov_len=32}, {iov_base="\2\0\0\0\2\0\0\0\2\0\0\0\1\0\0\0\36\0\0\0", iov_len=20}, {iov_base="\n\10\0\0\0\r\0\0\0bootstrap-osd\0\0\0\0\0\0\0\0", iov_len=30}, {iov_base="", iov_len=0}, {iov_base="FY\227j", iov_len=4}], msg_iovlen=5, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 86
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12
[pid  4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129
[pid  4760] sendmsg(12, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0012\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\242\343n\331", iov_len=32}, {iov_base="\2\0\0\0\2\0\0\0\2\0\0\0\1\0\0\0\36\0\0\0", iov_len=20}, {iov_base="\n\10\0\0\0\r\0\0\0bootstrap-osd\0\0\0\0\0\0\0\0", iov_len=30}, {iov_base="", iov_len=0}, {iov_base="FY\227j", iov_len=4}], msg_iovlen=5, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 86
[pid  4766] write(2, "2021-03-28T12:00:05.088+0000 7f7"..., 2172021-03-28T12:00:05.088+0000 7f75d2aa7700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory) = 217
[pid  4766] write(2, "2021-03-28T12:00:05.088+0000 7f7"..., 2182021-03-28T12:00:05.088+0000 7f75d2aa7700 -1 AuthRegistry(0x7f75cc059880) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx) = 218

So it might just be a spurious error from the parent.

Actions #2

Updated by Sage Weil about 3 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (ceph cli)
Actions #3

Updated by Janek Bevendorff over 1 year ago

This issue is still present in Pacific. Is there any way to work around it except for moving the keys to /etc/ceph?

ceph-volume lvm create creates the OSD, but then the mon getmap command fails after ignoring --keyring:

Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-1248/activate.monmap                     stderr: 2022-11-04T10:07:18.036+0100 7f41a328c700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (
2) No such file or directory
2022-11-04T10:07:18.036+0100 7f41a328c700 -1 AuthRegistry(0x7f419c05b868) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.b
in,, disabling cephx 
stderr: got monmap epoch 29

Finally, the purge-new command also fails:

unning command: /usr/bin/systemctl start ceph-osd@1248
stderr: Job for ceph-osd@1248.service failed because the control process exited with error code. 
See "systemctl status ceph-osd@1248.service" and "journalctl -xe" for details.
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.1248 --yes-i-really-mean-it
stderr: 2022-11-04T10:07:23.408+0100 7fa448749700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (
2) No such file or directory
2022-11-04T10:07:23.408+0100 7fa448749700 -1 AuthRegistry(0x7fa44005b868) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.b
in,, disabling cephx
stderr: purged osd.124
Actions #4

Updated by Radoslaw Zarzynski over 1 year ago

  • Tracker changed from Bug to Feature
  • Tags set to low-hanging-fruit
Actions

Also available in: Atom PDF