Project

General

Profile

Bug #48999

Data corruption with rbd_balance_parent_reads and rbd_balance_snap_reads set to true.

Added by chao guo 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
librbd
Target version:
% Done:

0%

Source:
Community (user)
Tags:
librbd, corruption, data loss, snapshot, clone, rbd_balance_parent_reads, rbd_balance_snap_reads
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We use librbd as VM storage, we discover a issue that has a great chance to cause data corruption on the snapshot and cloned rbd with these following options:
rbd_balance_parent_reads = true
rbd_balance_snap_reads = true

Steps to reproduce this bug:
1. Create a rbd image by cloning from a snapshot(do not know if cloning is necessary)
2. Take many snapshots on the cloned rbd (7 snapshots is easy enough to trigger this bug, but not every time). Create one clone image for each snapshot.
3. Create a simple python script or any other method (image is opened readonly) to read from the cloned image of the last snapshot, and do not stop reading it till the end.
4. Clone a image and instantly start a vm on it, and delete all the snapshot and their cloned image except the last one at the same time.(I do it by running in a script, even though there is a file lock between delete and clone)
5. The vm cannot startup, because the OS image is corrupted.

The last snapshot seem to be corrupted too, because any image cloned from this snapshot is corrupted.

Workaround:
Set these option to false, then this bug is gone.
rbd_balance_parent_reads = false
rbd_balance_snap_reads = false

I am using version 13.2.9 and 13.2.10. I do not know if this bug will affect any other version.

Also available in: Atom PDF