Project

General

Profile

Actions

Bug #23629

closed

RBD corruption after power off

Added by Josef Zelenka about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,
we have ran into a nasty bug regarding RBD in Ceph Luminous - we have encountered this across multiple different hardware configurations(various disk types, machine types etc etc) - we use Ceph as a storage backend for Openstack, but we found out that after an unclean shutdown of the hypervisors, all the volumes that are hosted on Ceph get corrupted and we can't use them anymore. All of this works, if the storage backend is on Jewel. I'm not sure if this is a bug or just our mistake, but taking the fact this has happenned multiple times to us, it seems rather like a bug. Our hypervisor/mon confs are attached.


Files

cephconfmon.conf (1007 Bytes) cephconfmon.conf Josef Zelenka, 04/10/2018 03:11 PM
cephconfhypervisor.conf (1.98 KB) cephconfhypervisor.conf Josef Zelenka, 04/10/2018 03:11 PM
Actions #1

Updated by John Spray about 6 years ago

  • Project changed from Ceph to rbd
Actions #2

Updated by Jason Dillaman about 6 years ago

  • Status changed from New to Need More Info

@Josef: this sounds like your images have the exclusive-lock feature enabled but your OpenStack Ceph user does not have the necessary permissions to blacklist the dead clients (from your unclean shutdown). This will result in an EIO back to your VM after the first attempt to write data. Can you please verify the CephX caps on your OpenStack users? Also, please confirm you followed step 6 from the Luminous upgrade notes [1].

[1] http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken

Actions #3

Updated by Josef Zelenka about 6 years ago

@Jason Borden: my bad, this helped, thanks a lot. We went from Jewel to Luminous, so we skipped Kraken altogether, that's why we missed that walkthrough. THis can be closed now.

Actions #4

Updated by Jason Dillaman about 6 years ago

  • Status changed from Need More Info to Closed

Great, no worries.

Actions

Also available in: Atom PDF