Project

General

Profile

Actions

Bug #45700

closed

cryptsetup LuksOpen hangs while creating or starting an encrypted OSD - Octopus

Added by Robert Toole almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When creating or starting an encrypted OSD a cryptsetup process hangs indefinitely and prevents the creation or start of the OSD until the PID of the process is terminated with kill -9. Once the process is terminated, the encrypted OSD is created / started normally. I believe that it is waiting for a password. The documentation for encrypted OSD is scarce, maybe I am just missing something.

Environment - CentOS Linux release 8.1.1911 (Core)
uname -r - 5.6.14-1.el8.elrepo.x86_64

Cryptsetup - cryptsetup-2.2.0-2.el8.x86_64 cryptsetup-libs-2.2.0-2.el8.x86_64

luks encryption / cryptsetup works normally when set up manually on these hosts.

Other than the elrepo main line kernel, and glusterfs rpms from the oVirt 4.4 repository it is a standard CentOS 8 install

The OSDs are being set up on a bcache device, and oVirt has not yet been installed..

Octopus install using cephadm in a container:

[root@tweb-kvm1 ~]# ceph -v
ceph version 15.2.2 (0c857e985a29d90501a285f242ea9c008df49eb8) octopus (stable)

[root@tweb-kvm1 ~]# ceph -s
cluster:
id: f3726b32-9df4-11ea-99df-78321ba8b7c1
health: HEALTH_WARN
3 stray daemons(s) not managed by cephadm

services:
mon: 3 daemons, quorum tweb-kvm1,tweb-kvm2,tweb-kvm3 (age 12h)
mgr: tweb-kvm2.apsvgw(active, since 12h), standbys: tweb-kvm1.yzidkr
mds: tweb-cephfs:1 {0=cephfs.tweb-kvm3.twawms=up:active} 2 up:standby
osd: 12 osds: 12 up (since 12h), 12 in (since 12h)
tcmu-runner: 3 daemons active (tweb-kvm1:rbd/oVirt_engine_disk, tweb-kvm2:rbd/oVirt_engine_disk, tweb-kvm3:rbd/oVirt_engine_disk)
task status:
data:
pools: 4 pools, 97 pgs
objects: 31 objects, 27 KiB
usage: 12 GiB used, 16 TiB / 16 TiB avail
pgs: 97 active+clean
io:
client: 2.5 KiB/s rd, 2 op/s rd, 0 op/s wr

Steps to reproduce:

Follow the install guide here: https://docs.ceph.com/docs/master/cephadm/install/ until you get to the Install OSDS section, and then use the dashboard or a yaml file (attached) to create the OSDs with encryption enabled. Observe that the OSDs are not created, and from a terminal run ps -aef | grep luks. You will see a process like this:

[root@tweb-kvm3 ~]# ps -aef | grep luks
root 12613 10018 81 14:47 ? 00:00:05 /usr/sbin/cryptsetup --key-file - --allow-discards luksOpen /dev/ceph-4c82cffb-6762-4f7a-8529-ae10e00c5570/osd-data-b2edfb21-60e4-41e3-9ea2-6cf1acb86ef1 jWDqZL-JbtY-5Nhh-znkP-jFZA-MgKf-CLT6zd

running this command manually from the terminal causes cryptsetup to prompt for the password. If you kill -9 this process the setup continues normally and the OSD is created with encryption and functions normally until the system is restarted. After a reboot, the OSD will not come up until you search for and kill these processes, and then the OSD will come up.

It appears that the keyfile is not being properly passed to the cryptsetup command, and results in it waiting indefinitely for a password, but how the osd is decrypted after killing the process is a mystery to me.

Non CentOS rpms:

yum list installed | grep ovirt

glusterfs.x86_64 7.5-1.el8 @ovirt-4.4-centos-gluster7
glusterfs-api.x86_64 7.5-1.el8 @ovirt-4.4-centos-gluster7
glusterfs-client-xlators.x86_64 7.5-1.el8 @ovirt-4.4-centos-gluster7
glusterfs-libs.x86_64 7.5-1.el8 @ovirt-4.4-centos-gluster7

yum list installed | grep epel
dkms.noarch 2.8.1-4.20200214git5ca628c.el8 @epel
epel-release.noarch 8-8.el8 @extras
fmt.x86_64 6.2.0-2.el8 @epel
gperftools-libs.x86_64 2.7-6.el8 @epel
leveldb.x86_64 1.22-1.el8 @epel
liboath.x86_64 2.6.2-3.el8 @epel
libunwind.x86_64 1.3.1-3.el8 @epel
screen.x86_64 4.6.2-10.el8 @epel

yum list installed | grep elrepo
elrepo-release.noarch 8.1-1.el8.elrepo @elrepo-kernel
kernel-ml.x86_64 5.6.14-1.el8.elrepo @elrepo-kernel
kernel-ml-core.x86_64 5.6.14-1.el8.elrepo @elrepo-kernel
kernel-ml-devel.x86_64 5.6.14-1.el8.elrepo @elrepo-kernel
kernel-ml-headers.x86_64 5.6.14-1.el8.elrepo @elrepo-kernel
kernel-ml-modules.x86_64 5.6.14-1.el8.elrepo @elrepo-kernel
kernel-ml-tools.x86_64 5.6.14-1.el8.elrepo @elrepo-kernel
kernel-ml-tools-libs.x86_64 5.6.14-1.el8.elrepo @elrepo-kernel
kmod-mpt3sas.x86_64 28.100.00.00-2.el8_1.elrepo @DD-1
python3-perf.x86_64 5.6.14-1.el8.elrepo @elrepo-kernel

yum list installed | grep ceph
ceph-common.x86_64 2:15.2.2-0.el8 @Ceph
ceph-iscsi.noarch 3.4-1.el8 @ceph-iscsi
cephadm.x86_64 2:15.2.2-0.el8 @Ceph
libcephfs2.x86_64 2:15.2.2-0.el8 @Ceph
python3-ceph-argparse.x86_64 2:15.2.2-0.el8 @Ceph
python3-ceph-common.x86_64 2:15.2.2-0.el8 @Ceph
python3-cephfs.x86_64 2:15.2.2-0.el8 @Ceph


Files

osd_logs.tar.gz (558 KB) osd_logs.tar.gz osd and ceph-volume logs from one of the hosts Robert Toole, 05/25/2020 03:27 PM
osd_spec.yml (117 Bytes) osd_spec.yml spec file used to provision the OSDs Robert Toole, 05/25/2020 03:34 PM

Related issues 1 (0 open1 closed)

Is duplicate of Orchestrator - Feature #44625: cephadm: test dmcryptResolved

Actions
Actions #1

Updated by Robert Toole almost 4 years ago

further testing: it appears none of the cryptsetup commands exit properly when called from ceph. when running them on the command line from the OS or the container, they work as expected:

ceph orch device zap tweb-kvm1 /dev/bcache3 --force

hung indefinitely until I located the cryptsetup process: /usr/sbin/cryptsetup remove /dev/mapper/RV2ih5-vfHl-tTPu-xRB3-Pl5z-Xmhp-atnzkC and killed it with -9

Actions #2

Updated by Sebastian Wagner almost 4 years ago

  • Project changed from RADOS to Orchestrator
  • Category deleted (Administration/Usability)
Actions #3

Updated by Sebastian Wagner almost 4 years ago

Actions #4

Updated by Sebastian Wagner almost 4 years ago

I expect that the backport of https://github.com/ceph/ceph/pull/34745 will fix this.

Actions #5

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF