Actions
Bug #64561
openceph-volume in containerized environment cannot find the correct osd directory
Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
When trying to perform a 'ceph-volume lvm migrate --osd-id 88 --osd-fsid <fsid> --from-db --target vg-nvme1n1/lv-35000cca0bd55daa0-db-new' I run into the issue where the 'lockbox.keyring' isnt found.
root@ceph-osd1:/# cephadm shell
root@ceph-osd1:/# ceph-volume lvm migrate --osd-id 88 --osd-fsid 85b41da8-47ea-4246-bcef-4bf5d3bcd172 --from db --target vg-nvme1n1/lv-35000cca0bd55daa0-db-new
--> Running ceph config-key get dm-crypt/osd/85b41da8-47ea-4246-bcef-4bf5d3bcd172/luks
Running command: /usr/bin/ceph --cluster ceph --name client.osd-lockbox.85b41da8-47ea-4246-bcef-4bf5d3bcd172 --keyring /var/lib/ceph/osd/ceph-88/lockbox.keyring config-key get dm-crypt/osd/85b41da8-47ea-4246-bcef-4bf5d3bcd172/luks
stderr: 2024-02-22T15:06:33.022+0000 7f0a1142f700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-88/lockbox.keyring: (2) No such file or directory
stderr: 2024-02-22T15:06:33.022+0000 7f0a1142f700 -1 AuthRegistry(0x7f0a0c061700) no keyring found at /var/lib/ceph/osd/ceph-88/lockbox.keyring, disabling cephx
stderr: 2024-02-22T15:06:33.022+0000 7f0a1142f700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-88/lockbox.keyring: (2) No such file or directory
stderr: 2024-02-22T15:06:33.022+0000 7f0a1142f700 -1 AuthRegistry(0x7f0a0c067a30) no keyring found at /var/lib/ceph/osd/ceph-88/lockbox.keyring, disabling cephx
stderr: 2024-02-22T15:06:33.026+0000 7f0a1142f700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-88/lockbox.keyring: (2) No such file or directory
stderr: 2024-02-22T15:06:33.026+0000 7f0a1142f700 -1 AuthRegistry(0x7f0a1142dea0) no keyring found at /var/lib/ceph/osd/ceph-88/lockbox.keyring, disabling cephx
stderr: 2024-02-22T15:06:33.026+0000 7f0a0b7fe700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
stderr: 2024-02-22T15:06:33.026+0000 7f0a0a7fc700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
stderr: 2024-02-22T15:06:33.026+0000 7f0a0affd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
stderr: 2024-02-22T15:06:33.026+0000 7f0a1142f700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
--> RuntimeError: Unable to retrieve dmcrypt secret
By creating a symlink for '/var/lib/ceph/osd/ceph-88 -> /var/lib/ceph/<cluster_fsid>/osd.88' I was able to get past the keyring error.
Then by running the same migrate command again I get a bit further:
root@ceph-osd1:/var/lib/ceph/osd/ceph-88# ceph-volume lvm migrate --osd-id 88 --osd-fsid 85b41da8-47ea-4246-bcef-4bf5d3bcd172 --from db --target vg-nvme1n1/lv-35000cca0bd55daa0-db-new --> Running ceph config-key get dm-crypt/osd/85b41da8-47ea-4246-bcef-4bf5d3bcd172/luks Running command: /usr/bin/ceph --cluster ceph --name client.osd-lockbox.85b41da8-47ea-4246-bcef-4bf5d3bcd172 --keyring /var/lib/ceph/osd/ceph-88/lockbox.keyring config-key get dm-crypt/osd/85b41da8-47ea-4246-bcef-4bf5d3bcd172/luks --> preparing dmcrypt for /dev/vg-nvme1n1/lv-35000cca0bd55daa0-db-new, uuid BvONhT-uuNb-QPQU-zhOC-Pp6a-06hB-9EYO14 Running command: /usr/sbin/cryptsetup --batch-mode --key-size 512 --key-file - luksFormat /dev/vg-nvme1n1/lv-35000cca0bd55daa0-db-new Running command: /usr/sbin/cryptsetup --key-size 512 --key-file - --allow-discards luksOpen /dev/vg-nvme1n1/lv-35000cca0bd55daa0-db-new BvONhT-uuNb-QPQU-zhOC-Pp6a-06hB-9EYO14 --> Migrate to new, Source: ['--devs-source', '/var/lib/ceph/osd/ceph-88/block.db'] Target: /dev/mapper/BvONhT-uuNb-QPQU-zhOC-Pp6a-06hB-9EYO14 stdout: inferring bluefs devices from bluestore path stderr: can't migrate /var/lib/ceph/osd/ceph-88/block.db, not a valid bluefs volume --> Failed to migrate device, error code:1 --> Undoing lv tag set Failed to migrate to : vg-nvme1n1/lv-35000cca0bd55daa0-db-new
This apparently failed. But when listing the osd 88 I can see that the WAL partition has updated the `db device`.
So a partial move has been done.
# ceph-volume lvm list 88 ====== osd.88 ====== [block] /dev/vg-35000cca0bd55daa0/lv-35000cca0bd55daa0 block device /dev/vg-35000cca0bd55daa0/lv-35000cca0bd55daa0 block uuid JJLhD5-c7jG-JKd0-BZGz-WMqy-6CAu-NyBinx cephx lockbox secret <secret> cluster fsid <cluster_fsid> cluster name ceph crush device class None db device /dev/vg-nvme0n1/lv-35000cca0bd55daa0-db db uuid IbO3fz-6tAy-Agu7-ZEk0-shUj-K3nJ-VT71k0 encrypted 1 osd fsid 85b41da8-47ea-4246-bcef-4bf5d3bcd172 osd id 88 osdspec affinity type block vdo 0 wal device /dev/vg-nvme0n1/lv-35000cca0bd55daa0-wal wal uuid pFoZpH-Jdem-1GXQ-XHfV-bG0N-QBii-vrsqGh devices /dev/sdd [db] /dev/vg-nvme0n1/lv-35000cca0bd55daa0-db block device /dev/vg-35000cca0bd55daa0/lv-35000cca0bd55daa0 block uuid JJLhD5-c7jG-JKd0-BZGz-WMqy-6CAu-NyBinx cephx lockbox secret <secret> cluster fsid <cluster_fsid> cluster name ceph crush device class None db device /dev/vg-nvme0n1/lv-35000cca0bd55daa0-db db uuid IbO3fz-6tAy-Agu7-ZEk0-shUj-K3nJ-VT71k0 encrypted 1 osd fsid 85b41da8-47ea-4246-bcef-4bf5d3bcd172 osd id 88 osdspec affinity type db vdo 0 wal device /dev/vg-nvme0n1/lv-35000cca0bd55daa0-wal wal uuid pFoZpH-Jdem-1GXQ-XHfV-bG0N-QBii-vrsqGh devices /dev/nvme0n1 [wal] /dev/vg-nvme0n1/lv-35000cca0bd55daa0-wal block device /dev/vg-35000cca0bd55daa0/lv-35000cca0bd55daa0 block uuid JJLhD5-c7jG-JKd0-BZGz-WMqy-6CAu-NyBinx cephx lockbox secret <secret> cluster fsid <cluster_fsid> cluster name ceph crush device class None db device /dev/vg-nvme1n1/lv-35000cca0bd55daa0-db-new db uuid BvONhT-uuNb-QPQU-zhOC-Pp6a-06hB-9EYO14 encrypted 1 osd fsid 85b41da8-47ea-4246-bcef-4bf5d3bcd172 osd id 88 osdspec affinity type wal vdo 0 wal device /dev/vg-nvme0n1/lv-35000cca0bd55daa0-wal wal uuid pFoZpH-Jdem-1GXQ-XHfV-bG0N-QBii-vrsqGh devices /dev/nvme0n1
This is where I'm stuck.
Actions