Feature #48392
openceph ignores --keyring?
0%
Description
I'm trying to set up a new OSD. I'm having some issues with the rollback not performing properly.
When "ceph-volume lvm prepare fails" it issues a "ceph osd purge-new" command which has a "--keyring" parameter leading to a valid file.
However, immediately after I see a "no such file or directory" error with a different list of files. Looks like the "--keyring" was ignored.
Here is the output:
root@pve2:/dev# ceph-volume lvm prepare --bluestore --data /dev/sda --block.db /dev/loop1
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e5e4e2e8-3705-4406-881c-e58b1223a6c3
Running command: /sbin/vgcreate --force --yes ceph-b48a4b34-5075-48fc-9dd5-c1aca872d741 /dev/sda
stdout: Physical volume "/dev/sda" successfully created.
stdout: Volume group "ceph-b48a4b34-5075-48fc-9dd5-c1aca872d741" successfully created
Running command: /sbin/lvcreate --yes -l 4291583 -n osd-block-e5e4e2e8-3705-4406-881c-e58b1223a6c3 ceph-b48a4b34-5075-48fc-9dd5-c1aca872d741
stdout: Logical volume "osd-block-e5e4e2e8-3705-4406-881c-e58b1223a6c3" created.
--> blkid could not detect a PARTUUID for device: /dev/loop1
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
stderr: 2020-11-29 17:49:52.497 7f9ef33f3700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2020-11-29 17:49:52.497 7f9ef33f3700 -1 AuthRegistry(0x7f9eec081d88) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
stderr: purged osd.0
--> RuntimeError: unable to use device
Updated by Brian Candler about 3 years ago
I see this with v15.2.10 as well.
The problem is at the rollback stage, specifically with osd purge-new
where it apparently doesn't pick up --keyring
. The simplest reproducer is this:
root@node2:~# /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.99999 2021-03-29T08:56:36.947+0000 7f747f867700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory 2021-03-29T08:56:36.947+0000 7f747f867700 -1 AuthRegistry(0x7f7478059880) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx Error EPERM: Are you SURE? Did you verify with 'ceph osd safe-to-destroy'? This will mean real, permanent data loss, as well as deletion of cephx and lockbox keys. Pass --yes-i-really-mean-it if you really do. root@node2:~#
The actual way I triggered it was when I wrongly gave /dev/mapper/vg-lv
instead of vg/lv
as the device path:
root@node2:~# lvcreate --size 1G --name foo vg_ssd Logical volume "foo" created. root@node2:~# ceph-volume lvm create --data /dev/mapper/vg_ssd-foo Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new caeb4a6e-b098-41d6-b190-3ae92d1a5f4c --> Was unable to complete a new OSD, will rollback changes Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.3 --yes-i-really-mean-it stderr: 2021-03-28T11:43:42.822+0000 7fdacdf20700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc /ceph/keyring.bin,: (2) No such file or directory stderr: stderr: 2021-03-28T11:43:42.822+0000 7fdacdf20700 -1 AuthRegistry(0x7fdac8059880) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ce ph/keyring,/etc/ceph/keyring.bin,, disabling cephx stderr: stderr: purged osd.3 --> RuntimeError: Cannot use device (/dev/mapper/vg_ssd-foo). A vg/lv path or an existing device is needed root@node2:~#
(However I do see "purged osd.3" in the output, so I think the command did complete successfully despite the error)
You can suppress this error by creating a symlink for the keyring file:
root@node2:~# ln -s /var/lib/ceph/bootstrap-osd/ceph.keyring /etc/ceph/ceph.client.bootstrap-osd.keyring root@node2:~# ceph-volume lvm create --data /dev/mapper/vg_ssd-foo Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new b63376d3-dcaa-4005-9b87-c67ec9ae7321 --> Was unable to complete a new OSD, will rollback changes Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.3 --yes-i-really-mean-it stderr: purged osd.3 --> RuntimeError: Cannot use device (/dev/mapper/vg_ssd-foo). A vg/lv path or an existing device is needed root@node2:~#
I note that purge-new
was introduced in https://github.com/ceph/ceph/pull/23259/files
Oddly, running under strace (without the symlink workaround) shows that the --keyring
file is successfully opened and read several times, in child processes or threads:
root@node2:~# strace -f /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.3 --yes-i-really-mean-it 2>&1 | egrep 'keyring|bootstrap' execve("/usr/bin/ceph", ["/usr/bin/ceph", "--cluster", "ceph", "--name", "client.bootstrap-osd", "--keyring", "/var/lib/ceph/bootstrap-osd/ceph"..., "osd", "purge-new", "osd.3", "--yes-i-really-mean-it"], 0x7ffeac173d78 /* 12 vars */) = 0 [pid 4756] openat(AT_FDCWD, "/etc/ceph/ceph.client.bootstrap-osd.keyring", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) [pid 4756] openat(AT_FDCWD, "/etc/ceph/ceph.keyring", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) [pid 4756] openat(AT_FDCWD, "/etc/ceph/keyring", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) [pid 4756] openat(AT_FDCWD, "/etc/ceph/keyring.bin", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) [pid 4757] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 3 [pid 4757] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 3 [pid 4757] read(3, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 3 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 3 [pid 4758] read(3, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129 [pid 4761] sendmsg(12, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0012\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\242\343n\331", iov_len=32}, {iov_base="\2\0\0\0\2\0\0\0\2\0\0\0\1\0\0\0\36\0\0\0", iov_len=20}, {iov_base="\n\10\0\0\0\r\0\0\0bootstrap-osd\0\0\0\0\0\0\0\0", iov_len=30}, {iov_base="", iov_len=0}, {iov_base="FY\227j", iov_len=4}], msg_iovlen=5, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 86 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] openat(AT_FDCWD, "/var/lib/ceph/bootstrap-osd/ceph.keyring", O_RDONLY|O_CLOEXEC) = 12 [pid 4758] read(12, "[client.bootstrap-osd]\n\tkey = AQ"..., 129) = 129 [pid 4760] sendmsg(12, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0012\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\242\343n\331", iov_len=32}, {iov_base="\2\0\0\0\2\0\0\0\2\0\0\0\1\0\0\0\36\0\0\0", iov_len=20}, {iov_base="\n\10\0\0\0\r\0\0\0bootstrap-osd\0\0\0\0\0\0\0\0", iov_len=30}, {iov_base="", iov_len=0}, {iov_base="FY\227j", iov_len=4}], msg_iovlen=5, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 86 [pid 4766] write(2, "2021-03-28T12:00:05.088+0000 7f7"..., 2172021-03-28T12:00:05.088+0000 7f75d2aa7700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory) = 217 [pid 4766] write(2, "2021-03-28T12:00:05.088+0000 7f7"..., 2182021-03-28T12:00:05.088+0000 7f75d2aa7700 -1 AuthRegistry(0x7f75cc059880) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx) = 218
So it might just be a spurious error from the parent.
Updated by Sage Weil about 3 years ago
- Project changed from Ceph to RADOS
- Category deleted (
ceph cli)
Updated by Janek Bevendorff over 1 year ago
This issue is still present in Pacific. Is there any way to work around it except for moving the keys to /etc/ceph?
ceph-volume lvm create creates the OSD, but then the mon getmap command fails after ignoring --keyring:
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-1248/activate.monmap stderr: 2022-11-04T10:07:18.036+0100 7f41a328c700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: ( 2) No such file or directory 2022-11-04T10:07:18.036+0100 7f41a328c700 -1 AuthRegistry(0x7f419c05b868) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.b in,, disabling cephx stderr: got monmap epoch 29
Finally, the purge-new command also fails:
unning command: /usr/bin/systemctl start ceph-osd@1248 stderr: Job for ceph-osd@1248.service failed because the control process exited with error code. See "systemctl status ceph-osd@1248.service" and "journalctl -xe" for details. --> Was unable to complete a new OSD, will rollback changes Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.1248 --yes-i-really-mean-it stderr: 2022-11-04T10:07:23.408+0100 7fa448749700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: ( 2) No such file or directory 2022-11-04T10:07:23.408+0100 7fa448749700 -1 AuthRegistry(0x7fa44005b868) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.b in,, disabling cephx stderr: purged osd.124
Updated by Radoslaw Zarzynski over 1 year ago
- Tracker changed from Bug to Feature
- Tags set to low-hanging-fruit