Bug #13940
closedOSDs fail to start on reboot with dmcrypt/luks
0%
Description
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
Linux smr-r1-r1-head2 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 14.04.3 LTS
I have 60 osds per host. They were created with ceph-disk activate --dmcrypt. They're using LUKS, When I reboot a host ~ 10 osds come up, the rest fail to. The one's that fail are all have a /dev/mapper/temporary-cryptsetup-NNNN entry for the journal and data partitions. The symptoms all match this mailing list issue: http://www.spinics.net/lists/ceph-devel/msg25281.html. I ended up fixing it the same way, by luks closing, the temporary-devices and then decrypting them with the right names and start ceph-osd-all.
Updated by Loïc Dachary about 8 years ago
- Status changed from New to 12
- Priority changed from Normal to High
- Release set to hammer
Updated by Loïc Dachary about 8 years ago
This is most probably a timeout because individual udev actions take too long and abort or fail (I don't know exactly what happens when a udev action takes long to complete). This cannot happen in infernalis but the modifications are extensive and not easy to backport. The general idea is to not do any work when ceph-disk is called from udev. Instead ceph-disk trigger is called and launches a systemd/upstart action in the background.
Updated by Dan Mick about 8 years ago
Submitter responded in email (which bounced):
Ok thanks for the update. We are evaluating Infernalis for our next deployment, and I have a script that cleans up from the broken state on boot, so we can limp along as is if this needs to be WONTFIX.