Project

General

Profile

Actions

Bug #20388

closed

combination of kvm using librbd from kraken and online resize leads to data corruption

Added by Yann Dupont almost 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi everybody. We experimented big data corruption recently. I've been able to reproduce it and I suspect librbd from kraken. Here are some steps which leads to a reproducible behavior.

-> Start a fresh standard debian and with Jewel librbd & librados (deb http://download.ceph.com/debian-jewel jessie main)

> Launch a VM on this machine, using some volumes from a ceph cluster with librbd.
> Use the VM, do an online resize of ceph volume : All is OK (of course, need to restart qemu to benefit the extra space)

Now Stop vm, change libraries for kraken ones (deb http://download.ceph.com/debian-kraken jessie main)

- restart the VM, do an online resize of volume : resize operation is stuck forever (and notice some virtio or scsi errors on your vm).
In fact resize is stuck until you stop your VM. As soon as your vm is stopped , resize operation succeed, BUT...
Your data is lost. Now volume is filled with zeroes.

Please note : unmouting the volume from the vm isn't sufficient ; resize operation is stuck (until vm is stopped), and data corruption occurs.

Stopping VM (qemu stops) and doing resize with vm stopped seems safe.

Can somebody try to reproduce the issue ?

Actions #1

Updated by Yann Dupont almost 7 years ago

Just to add some confusion, I'm unable to reproduce this issue on a ubuntu-based machine with librbd from kraken.
So it may be related to debian version.

Or something else related to our setup. May need to dig further.

Actions #2

Updated by Jason Dillaman almost 7 years ago

  • Status changed from New to Need More Info

We will need a repeatable reproducer to try and fix it -- or debug-level logs from the affected librbd instance.

Actions #3

Updated by Yann Dupont almost 7 years ago

Hi Jason, thanks for the answer.
Yes I'm very aware this bug report lacks precision. In fact I was in the middle of writing to dev mailing list and preferred open the ticket here, but maybe this is an error, as it's not a formal bug report but more 'Am i the only one seeing this ?'.

When I'll be able to narrow further and be sure we're not seeing a side effect of something faulty on our side, I'll go back to you, (with logs attached this time).

Actions #4

Updated by Jason Dillaman almost 7 years ago

Perfect, thanks.

Actions #5

Updated by Jason Dillaman over 6 years ago

  • Status changed from Need More Info to Closed

Closing due to lack of input. Please re-open if data can be provided.

Actions

Also available in: Atom PDF