Bug #52615
openceph-volume should fail if luksFormat fails
0%
Description
When setting up an encrypted OSD with a DB/WAL device using ceph-volume, ceph-volume will try to luksFormat the OSD and DB device. We had a case where this failed. The DB device in our case wasn't properly wiped from a previous attempt, and it was still a decrypted/open luks device. In the ceph-volume output, you would see something like this:
Running command: /usr/sbin/cryptsetup --batch-mode --key-file - luksFormat /dev/disk/by-partuuid/5d4f3faf-7a25-4433-b857-3a78782656e3
stderr: Device /dev/disk/by-partuuid/5d4f3faf-7a25-4433-b857-3a78782656e3 is in use. Can not proceed with format operation.
...
Running command: /usr/sbin/cryptsetup --key-file - --allow-discards luksOpen /dev/disk/by-partuuid/5d4f3faf-7a25-4433-b857-3a78782656e3 5d4f3faf-7a25-4433-b857-3a78782656e3
stderr: Device 5d4f3faf-7a25-4433-b857-3a78782656e3 already exists.
ceph-volume still succeeded here and the OSD runs fine, the DB partition however is encrypted with an old key, and not the one from the OSD (which becomes a problem once you restart the machine for example). I guess the same could happen with the OSD device itsself, but I didn't test that.
Although this is mostly a user error (since the partition wasn't correctly wiped etc.) and an edge case, I still think ceph-volume should exit/error when luksFormat fails.
P.S.: the documentation for luks_format was copy-pasted from luks_open and should be updated: https://github.com/ceph/ceph/blob/84895674cfcf54e5a620d64a88d2a1fad1476b02/src/ceph-volume/ceph_volume/util/encryption.py#L33
Updated by Sebastian Wagner over 2 years ago
- Project changed from Ceph to ceph-volume