Project

General

Profile

Actions

Bug #64561

open

ceph-volume in containerized environment cannot find the correct osd directory

Added by Andreas Eriksson 2 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When trying to perform a 'ceph-volume lvm migrate --osd-id 88 --osd-fsid <fsid> --from-db --target vg-nvme1n1/lv-35000cca0bd55daa0-db-new' I run into the issue where the 'lockbox.keyring' isnt found.

root@ceph-osd1:/# cephadm shell

root@ceph-osd1:/# ceph-volume lvm migrate --osd-id 88  --osd-fsid 85b41da8-47ea-4246-bcef-4bf5d3bcd172 --from db --target vg-nvme1n1/lv-35000cca0bd55daa0-db-new
--> Running ceph config-key get dm-crypt/osd/85b41da8-47ea-4246-bcef-4bf5d3bcd172/luks
Running command: /usr/bin/ceph --cluster ceph --name client.osd-lockbox.85b41da8-47ea-4246-bcef-4bf5d3bcd172 --keyring /var/lib/ceph/osd/ceph-88/lockbox.keyring config-key get dm-crypt/osd/85b41da8-47ea-4246-bcef-4bf5d3bcd172/luks
 stderr: 2024-02-22T15:06:33.022+0000 7f0a1142f700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-88/lockbox.keyring: (2) No such file or directory
 stderr: 2024-02-22T15:06:33.022+0000 7f0a1142f700 -1 AuthRegistry(0x7f0a0c061700) no keyring found at /var/lib/ceph/osd/ceph-88/lockbox.keyring, disabling cephx
 stderr: 2024-02-22T15:06:33.022+0000 7f0a1142f700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-88/lockbox.keyring: (2) No such file or directory
 stderr: 2024-02-22T15:06:33.022+0000 7f0a1142f700 -1 AuthRegistry(0x7f0a0c067a30) no keyring found at /var/lib/ceph/osd/ceph-88/lockbox.keyring, disabling cephx
 stderr: 2024-02-22T15:06:33.026+0000 7f0a1142f700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-88/lockbox.keyring: (2) No such file or directory
 stderr: 2024-02-22T15:06:33.026+0000 7f0a1142f700 -1 AuthRegistry(0x7f0a1142dea0) no keyring found at /var/lib/ceph/osd/ceph-88/lockbox.keyring, disabling cephx
 stderr: 2024-02-22T15:06:33.026+0000 7f0a0b7fe700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
 stderr: 2024-02-22T15:06:33.026+0000 7f0a0a7fc700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
 stderr: 2024-02-22T15:06:33.026+0000 7f0a0affd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
 stderr: 2024-02-22T15:06:33.026+0000 7f0a1142f700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
-->  RuntimeError: Unable to retrieve dmcrypt secret

By creating a symlink for '/var/lib/ceph/osd/ceph-88 -> /var/lib/ceph/<cluster_fsid>/osd.88' I was able to get past the keyring error.

Then by running the same migrate command again I get a bit further:

root@ceph-osd1:/var/lib/ceph/osd/ceph-88# ceph-volume lvm migrate --osd-id 88  --osd-fsid 85b41da8-47ea-4246-bcef-4bf5d3bcd172 --from db --target vg-nvme1n1/lv-35000cca0bd55daa0-db-new          
--> Running ceph config-key get dm-crypt/osd/85b41da8-47ea-4246-bcef-4bf5d3bcd172/luks
Running command: /usr/bin/ceph --cluster ceph --name client.osd-lockbox.85b41da8-47ea-4246-bcef-4bf5d3bcd172 --keyring /var/lib/ceph/osd/ceph-88/lockbox.keyring config-key get dm-crypt/osd/85b41da8-47ea-4246-bcef-4bf5d3bcd172/luks
-->  preparing dmcrypt for /dev/vg-nvme1n1/lv-35000cca0bd55daa0-db-new, uuid BvONhT-uuNb-QPQU-zhOC-Pp6a-06hB-9EYO14
Running command: /usr/sbin/cryptsetup --batch-mode --key-size 512 --key-file - luksFormat /dev/vg-nvme1n1/lv-35000cca0bd55daa0-db-new
Running command: /usr/sbin/cryptsetup --key-size 512 --key-file - --allow-discards luksOpen /dev/vg-nvme1n1/lv-35000cca0bd55daa0-db-new BvONhT-uuNb-QPQU-zhOC-Pp6a-06hB-9EYO14
--> Migrate to new, Source: ['--devs-source', '/var/lib/ceph/osd/ceph-88/block.db'] Target: /dev/mapper/BvONhT-uuNb-QPQU-zhOC-Pp6a-06hB-9EYO14
 stdout: inferring bluefs devices from bluestore path
 stderr: can't migrate /var/lib/ceph/osd/ceph-88/block.db, not a valid bluefs volume
--> Failed to migrate device, error code:1
--> Undoing lv tag set
Failed to migrate to : vg-nvme1n1/lv-35000cca0bd55daa0-db-new

This apparently failed. But when listing the osd 88 I can see that the WAL partition has updated the `db device`.
So a partial move has been done.

# ceph-volume lvm list 88

====== osd.88 ======

  [block]       /dev/vg-35000cca0bd55daa0/lv-35000cca0bd55daa0

      block device              /dev/vg-35000cca0bd55daa0/lv-35000cca0bd55daa0
      block uuid                JJLhD5-c7jG-JKd0-BZGz-WMqy-6CAu-NyBinx
      cephx lockbox secret      <secret>
      cluster fsid              <cluster_fsid>
      cluster name              ceph
      crush device class        None
      db device                 /dev/vg-nvme0n1/lv-35000cca0bd55daa0-db
      db uuid                   IbO3fz-6tAy-Agu7-ZEk0-shUj-K3nJ-VT71k0
      encrypted                 1
      osd fsid                  85b41da8-47ea-4246-bcef-4bf5d3bcd172
      osd id                    88
      osdspec affinity
      type                      block
      vdo                       0
      wal device                /dev/vg-nvme0n1/lv-35000cca0bd55daa0-wal
      wal uuid                  pFoZpH-Jdem-1GXQ-XHfV-bG0N-QBii-vrsqGh
      devices                   /dev/sdd

  [db]          /dev/vg-nvme0n1/lv-35000cca0bd55daa0-db

      block device              /dev/vg-35000cca0bd55daa0/lv-35000cca0bd55daa0
      block uuid                JJLhD5-c7jG-JKd0-BZGz-WMqy-6CAu-NyBinx
      cephx lockbox secret      <secret>
      cluster fsid              <cluster_fsid>
      cluster name              ceph
      crush device class        None
      db device                 /dev/vg-nvme0n1/lv-35000cca0bd55daa0-db
      db uuid                   IbO3fz-6tAy-Agu7-ZEk0-shUj-K3nJ-VT71k0
      encrypted                 1
      osd fsid                  85b41da8-47ea-4246-bcef-4bf5d3bcd172
      osd id                    88
      osdspec affinity
      type                      db
      vdo                       0
      wal device                /dev/vg-nvme0n1/lv-35000cca0bd55daa0-wal
      wal uuid                  pFoZpH-Jdem-1GXQ-XHfV-bG0N-QBii-vrsqGh
      devices                   /dev/nvme0n1

  [wal]         /dev/vg-nvme0n1/lv-35000cca0bd55daa0-wal

      block device              /dev/vg-35000cca0bd55daa0/lv-35000cca0bd55daa0
      block uuid                JJLhD5-c7jG-JKd0-BZGz-WMqy-6CAu-NyBinx
      cephx lockbox secret      <secret>
      cluster fsid              <cluster_fsid>
      cluster name              ceph
      crush device class        None
      db device                 /dev/vg-nvme1n1/lv-35000cca0bd55daa0-db-new
      db uuid                   BvONhT-uuNb-QPQU-zhOC-Pp6a-06hB-9EYO14
      encrypted                 1
      osd fsid                  85b41da8-47ea-4246-bcef-4bf5d3bcd172
      osd id                    88
      osdspec affinity
      type                      wal
      vdo                       0
      wal device                /dev/vg-nvme0n1/lv-35000cca0bd55daa0-wal
      wal uuid                  pFoZpH-Jdem-1GXQ-XHfV-bG0N-QBii-vrsqGh
      devices                   /dev/nvme0n1

This is where I'm stuck.

Actions #1

Updated by Andreas Eriksson 2 months ago

This OSD is setup as "unmanaged" in an older Ceph version. Where the layout looks like this: block - hdd, db - nvme0/lv-<blockdevice>-db 50G, wal - nvme0/lv-<blockdevice>-wal 1G.

Actions

Also available in: Atom PDF