Project

General

Profile

Bug #278

data corruption after rbd rollback

Added by Yehuda Sadeh over 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Andrew created a kvm-rbd image, took a snapshot (from the rbdtool while the rbd-kvm was running, however, I don't think the specific corruption we see is related). Following that he stopped the kvm and did a rollback to that snapshot. The first 128MB of the image are now empty.

Looking at the logs we see a write on the object that completed (empty snapc, hence can be regarded as prior to the snapshot):
10.07.13_18:34:56.295916 7fb39b6c1910 -- 10.14.0.103:6800/14259 --> 10.14.0.108:0/5242 -- osd_op_reply(3099 rb.0.1.000000000000 [write 2633728~4096] = 0) v1 -- ?+0 0x227ea80

rollback request:

10.07.13_18:35:12.430366 7fb39a6bf910 -- 10.14.0.103:6800/14259 <== client9475 10.14.0.108:0/5648 1 ==== osd_op(client9475.0:3 rb.0.1.000000000000 [rollback 2] 21.f76d snapc 2=[2]) v1 ==== 135+0+0(867103697 0 0) 0x253fbb0

for which we see this:

10.07.13_18:35:12.430802 7fb398dbb910 osd2 663 pg[21.5( v 663'34214 (663'34212,663'34214] n=343 ec=660 les=661 660/660/660) [2,4] r=0 mlcod 663'34213 active+clean] _rollback_to deleting head on rb.0.1.000000000000 because got ENOENT on find_object_context

osd.2.gz (1.31 MB) Yehuda Sadeh, 07/14/2010 01:27 PM

Screenshot-Untitled_Window.png View - VNC terminal Virtual Machine (11.3 KB) Wido den Hollander, 08/31/2010 06:28 AM

filelist.txt View - Filelist after rollback (1.25 KB) Wido den Hollander, 09/01/2010 01:44 AM

History

#1 Updated by Sage Weil over 9 years ago

  • Status changed from New to Resolved

#2 Updated by Wido den Hollander over 9 years ago

I'm setting this back to "Testing" since i'm seeing this too.

root@client01:~# rbd snap create --snap=charlie001 charlie
root@client01:~# rbd snap ls charlie
4    charlie001    53687091200

< DELETE ALL DATA INSIDE VM >

root@client01:~# virsh destroy charlie
Domain charlie destroyed

root@client01:~# time rbd snap rollback --snap=charlie001 charlie

real    9m26.463s
user    0m2.240s
sys    0m2.170s
root@client01:~# virsh start charlie
Domain charlie started

root@client01:~#

After creating the snapshot I did a "rm -rf /*" in the virtual machine.

As you can see, the VM's data got corrupted.

I'm not sure which logfile to look into.

#3 Updated by Wido den Hollander over 9 years ago

I just exported the "charlie" image and mounted it through a loop device.

Attached is a filelist which I got from the VM.

A df -h shows me:

root@logger:/mnt/loop1# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.2G  3.7G  5.1G  43% /
none                  2.0G  176K  2.0G   1% /dev
none                  2.0G     0  2.0G   0% /dev/shm
none                  2.0G   72K  2.0G   1% /var/run
none                  2.0G     0  2.0G   0% /var/lock
none                  2.0G     0  2.0G   0% /lib/init/rw
/dev/mapper/data-logs
                      500G  121G  380G  25% /srv/ceph
/dev/mapper/loop0p1    48G  180M   45G   1% /mnt/loop1
root@logger:/mnt/loop1# 

To me it seems that the rollback didn't do anything at all, the data is still erased.

Searching through the OSD logs I didn't see anything about a "rollback", but this could be due to my low debug level.

#4 Updated by Wido den Hollander over 9 years ago

Did another test:

  • Created a second disk for the VM "alpha"
  • Formatted the disk with ext4 inside the VM
  • Mounted the VM
  • Downloaded several Ubuntu ISO's onto that disk
  • Snapshotted the "alpah-second" disk
  • Removed all the ISO's
  • Halted "alpha"
  • Rolled back the snapshot
  • Started "alpha" again
  • Mounted the disk

I then found out that all the ISO's were still gone, it seems no rollback has been done at all.

The filesystem mounted without errors or whatsoever, so it seems that the rollback didn't do anything.

root@client01:~# virsh start alpha
Domain alpha started

root@client01:~# rbd ls
alpha
alpha-second
beta
charlie
root@client01:~# rbd snap create --snap=alpha-second-snap alpha-second
root@client01:~# rbd snap ls alpha-second
6    alpha-second-snap    10737418240
root@client01:~# time rbd snap rollback --snap=alpha-second-snap alpha-second

real    1m40.516s
user    0m0.460s
sys    0m0.520s
root@client01:~# virsh start alpha
Domain alpha started

root@client01:~#

#5 Updated by Wido den Hollander over 9 years ago

  • Status changed from 7 to Closed

Just spoke to Yehudasa, but this seems to be a synchronization problem between RBD and qemu-rbd

When using qemu-rbd you should shutdown the VM first before creating the snapshot via the "rbd" tool, or use "qemu-img" to use the snapshot.

Also, it's recommended to do the rollback while the VM is down.

#6 Updated by Sage Weil about 9 years ago

  • Project changed from 3 to Ceph

Also available in: Atom PDF