Project

General

Profile

Actions

Bug #20388

closed

combination of kvm using librbd from kraken and online resize leads to data corruption

Added by Yann Dupont almost 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi everybody. We experimented big data corruption recently. I've been able to reproduce it and I suspect librbd from kraken. Here are some steps which leads to a reproducible behavior.

-> Start a fresh standard debian and with Jewel librbd & librados (deb http://download.ceph.com/debian-jewel jessie main)

> Launch a VM on this machine, using some volumes from a ceph cluster with librbd.
> Use the VM, do an online resize of ceph volume : All is OK (of course, need to restart qemu to benefit the extra space)

Now Stop vm, change libraries for kraken ones (deb http://download.ceph.com/debian-kraken jessie main)

- restart the VM, do an online resize of volume : resize operation is stuck forever (and notice some virtio or scsi errors on your vm).
In fact resize is stuck until you stop your VM. As soon as your vm is stopped , resize operation succeed, BUT...
Your data is lost. Now volume is filled with zeroes.

Please note : unmouting the volume from the vm isn't sufficient ; resize operation is stuck (until vm is stopped), and data corruption occurs.

Stopping VM (qemu stops) and doing resize with vm stopped seems safe.

Can somebody try to reproduce the issue ?

Actions

Also available in: Atom PDF