Project

General

Profile

Actions

Bug #13940

closed

OSDs fail to start on reboot with dmcrypt/luks

Added by Aaron Bassett over 8 years ago. Updated about 8 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
Linux smr-r1-r1-head2 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 14.04.3 LTS

I have 60 osds per host. They were created with ceph-disk activate --dmcrypt. They're using LUKS, When I reboot a host ~ 10 osds come up, the rest fail to. The one's that fail are all have a /dev/mapper/temporary-cryptsetup-NNNN entry for the journal and data partitions. The symptoms all match this mailing list issue: http://www.spinics.net/lists/ceph-devel/msg25281.html. I ended up fixing it the same way, by luks closing, the temporary-devices and then decrypting them with the right names and start ceph-osd-all.

Actions #1

Updated by Loïc Dachary about 8 years ago

  • Status changed from New to 12
  • Priority changed from Normal to High
  • Release set to hammer
Actions #2

Updated by Loïc Dachary about 8 years ago

This is most probably a timeout because individual udev actions take too long and abort or fail (I don't know exactly what happens when a udev action takes long to complete). This cannot happen in infernalis but the modifications are extensive and not easy to backport. The general idea is to not do any work when ceph-disk is called from udev. Instead ceph-disk trigger is called and launches a systemd/upstart action in the background.

Actions #3

Updated by Dan Mick about 8 years ago

Submitter responded in email (which bounced):

Ok thanks for the update. We are evaluating Infernalis for our next deployment, and I have a script that cleans up from the broken state on boot, so we can limp along as is if this needs to be WONTFIX.

Actions #4

Updated by Samuel Just about 8 years ago

  • Status changed from 12 to Won't Fix
Actions

Also available in: Atom PDF