Bug #23629
closedRBD corruption after power off
0%
Description
Hello,
we have ran into a nasty bug regarding RBD in Ceph Luminous - we have encountered this across multiple different hardware configurations(various disk types, machine types etc etc) - we use Ceph as a storage backend for Openstack, but we found out that after an unclean shutdown of the hypervisors, all the volumes that are hosted on Ceph get corrupted and we can't use them anymore. All of this works, if the storage backend is on Jewel. I'm not sure if this is a bug or just our mistake, but taking the fact this has happenned multiple times to us, it seems rather like a bug. Our hypervisor/mon confs are attached.
Files
Updated by Jason Dillaman about 6 years ago
- Status changed from New to Need More Info
@Josef: this sounds like your images have the exclusive-lock feature enabled but your OpenStack Ceph user does not have the necessary permissions to blacklist the dead clients (from your unclean shutdown). This will result in an EIO back to your VM after the first attempt to write data. Can you please verify the CephX caps on your OpenStack users? Also, please confirm you followed step 6 from the Luminous upgrade notes [1].
[1] http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken
Updated by Josef Zelenka about 6 years ago
@Jason Borden: my bad, this helped, thanks a lot. We went from Jewel to Luminous, so we skipped Kraken altogether, that's why we missed that walkthrough. THis can be closed now.
Updated by Jason Dillaman about 6 years ago
- Status changed from Need More Info to Closed
Great, no worries.