Project

General

Profile

Actions

Bug #19415

closed

Encrypted BlueStore OSDs fail to start via udev due to missing ceph_fsid in lockbox

Added by Steve Taylor about 7 years ago. Updated almost 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
ceph cli
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

BlueStore OSDs deployed with dmcrypt run fine until the host is rebooted, at which point udev fails to map the block partitions because it can't obtain the dmcrypt keys from a mon. The same behavior can be observed by stopping an OSD and closing the dmcrypt volume for its block partition and then attempting to activate it with ceph-disk.

I attempted to activate an OSD manually using 'ceph-disk activate-block --dmcrypt <block dev>' and got an error (using -v) about the OSD not having a cluster ID. When I searched for the error in the ceph-disk source, I discovered that the issue was a missing ceph_fsid file containing the cluster fsid in the OSD's lockbox, which is required in order for ceph-disk to query a mon for the dmcrypt key for the block partition.

Once I wrote the cluster fsid to a new ceph_fsid file in all of my OSD lockbox filesystems, all of my OSDs were able to start using 'ceph-disk activate-block' and were also able to start automatically when the host was rebooted.

Actions #1

Updated by Sage Weil almost 7 years ago

  • Status changed from New to Can't reproduce

Hmm, I just tried this with current master, and it worked. What version are you using?

# ls -al /var/lib/ceph/osd-lockbox/428c90be-30e5-498d-9a93-701b20ca4913
total 25
drwxr-xr-x 3 root root  1024 Apr 27 19:20 .
drwxr-xr-x 5 root root  4096 Apr 27 19:20 ..
-rw-r--r-- 1 ceph ceph    37 Apr 27 19:20 block.db-uuid
-rw-r--r-- 1 ceph ceph    37 Apr 27 19:20 block-uuid
-rw-r--r-- 1 ceph ceph    37 Apr 27 19:20 block.wal-uuid
-rw-r--r-- 1 ceph ceph    37 Apr 27 19:20 ceph_fsid
-rw-r--r-- 1 ceph ceph    12 Apr 27 19:20 key-management-mode
-rw-r--r-- 1 root root   106 Apr 27 19:20 keyring
drwx------ 2 root root 12288 Apr 27 19:20 lost+found
-rw-r--r-- 1 ceph ceph    25 Apr 27 19:20 magic
-rw-r--r-- 1 ceph ceph    37 Apr 27 19:20 osd-uuid
Actions #2

Updated by Steve Taylor almost 7 years ago

I saw it with master, but that was about a month ago now. I also failed to mention that I deployed my osds with ceph-deploy. I don't know if that's relevant. I assumed it was probably a ceph-disk thing, but maybe I'm wrong, or maybe it's already fixed. I'll have to re-test.

Actions

Also available in: Atom PDF