Bug #23629: RBD corruption after power off - rbd - Ceph

Actions

Copy link

Bug #23629

closed

RBD corruption after power off

Added by Josef Zelenka about 6 years ago. Updated about 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.2.2, Ceph - v12.2.4

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hello,
we have ran into a nasty bug regarding RBD in Ceph Luminous - we have encountered this across multiple different hardware configurations(various disk types, machine types etc etc) - we use Ceph as a storage backend for Openstack, but we found out that after an unclean shutdown of the hypervisors, all the volumes that are hosted on Ceph get corrupted and we can't use them anymore. All of this works, if the storage backend is on Jewel. I'm not sure if this is a bug or just our mistake, but taking the fact this has happenned multiple times to us, it seems rather like a bug. Our hypervisor/mon confs are attached.

Files

Download all files

cephconfmon.conf (1007 Bytes) cephconfmon.conf		Josef Zelenka, 04/10/2018 03:11 PM
cephconfhypervisor.conf (1.98 KB) cephconfhypervisor.conf		Josef Zelenka, 04/10/2018 03:11 PM

Actions

Copy link

Updated by John Spray about 6 years ago

Project changed from Ceph to rbd

Actions

Copy link

Updated by Jason Dillaman about 6 years ago

Status changed from New to Need More Info

@Josef: this sounds like your images have the exclusive-lock feature enabled but your OpenStack Ceph user does not have the necessary permissions to blacklist the dead clients (from your unclean shutdown). This will result in an EIO back to your VM after the first attempt to write data. Can you please verify the CephX caps on your OpenStack users? Also, please confirm you followed step 6 from the Luminous upgrade notes [1].

[1] http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken

Actions

Copy link

Updated by Josef Zelenka about 6 years ago

@Jason Borden: my bad, this helped, thanks a lot. We went from Jewel to Luminous, so we skipped Kraken altogether, that's why we missed that walkthrough. THis can be closed now.

Actions

Copy link

Updated by Jason Dillaman about 6 years ago

Status changed from Need More Info to Closed

Great, no worries.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #23629

RBD corruption after power off

Updated by John Spray about 6 years ago

Updated by Jason Dillaman about 6 years ago

Updated by Josef Zelenka about 6 years ago

Updated by Jason Dillaman about 6 years ago