data corruption after rbd rollback
Andrew created a kvm-rbd image, took a snapshot (from the rbdtool while the rbd-kvm was running, however, I don't think the specific corruption we see is related). Following that he stopped the kvm and did a rollback to that snapshot. The first 128MB of the image are now empty.
Looking at the logs we see a write on the object that completed (empty snapc, hence can be regarded as prior to the snapshot):
10.07.13_18:34:56.295916 7fb39b6c1910 -- 10.14.0.103:6800/14259 --> 10.14.0.108:0/5242 -- osd_op_reply(3099 rb.0.1.000000000000 [write 2633728~4096] = 0) v1 -- ?+0 0x227ea80
10.07.13_18:35:12.430366 7fb39a6bf910 -- 10.14.0.103:6800/14259 <== client9475 10.14.0.108:0/5648 1 ==== osd_op(client9475.0:3 rb.0.1.000000000000 [rollback 2] 21.f76d snapc 2=) v1 ==== 135+0+0(867103697 0 0) 0x253fbb0
for which we see this:
10.07.13_18:35:12.430802 7fb398dbb910 osd2 663 pg[21.5( v 663'34214 (663'34212,663'34214] n=343 ec=660 les=661 660/660/660) [2,4] r=0 mlcod 663'34213 active+clean] _rollback_to deleting head on rb.0.1.000000000000 because got ENOENT on find_object_context
#2 Updated by Wido den Hollander about 9 years ago
I'm setting this back to "Testing" since i'm seeing this too.
root@client01:~# rbd snap create --snap=charlie001 charlie root@client01:~# rbd snap ls charlie 4 charlie001 53687091200 < DELETE ALL DATA INSIDE VM > root@client01:~# virsh destroy charlie Domain charlie destroyed root@client01:~# time rbd snap rollback --snap=charlie001 charlie real 9m26.463s user 0m2.240s sys 0m2.170s root@client01:~# virsh start charlie Domain charlie started root@client01:~#
After creating the snapshot I did a "rm -rf /*" in the virtual machine.
As you can see, the VM's data got corrupted.
I'm not sure which logfile to look into.
#3 Updated by Wido den Hollander about 9 years ago
I just exported the "charlie" image and mounted it through a loop device.
Attached is a filelist which I got from the VM.
A df -h shows me:
root@logger:/mnt/loop1# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 9.2G 3.7G 5.1G 43% / none 2.0G 176K 2.0G 1% /dev none 2.0G 0 2.0G 0% /dev/shm none 2.0G 72K 2.0G 1% /var/run none 2.0G 0 2.0G 0% /var/lock none 2.0G 0 2.0G 0% /lib/init/rw /dev/mapper/data-logs 500G 121G 380G 25% /srv/ceph /dev/mapper/loop0p1 48G 180M 45G 1% /mnt/loop1 root@logger:/mnt/loop1#
To me it seems that the rollback didn't do anything at all, the data is still erased.
Searching through the OSD logs I didn't see anything about a "rollback", but this could be due to my low debug level.
#4 Updated by Wido den Hollander about 9 years ago
Did another test:
- Created a second disk for the VM "alpha"
- Formatted the disk with ext4 inside the VM
- Mounted the VM
- Downloaded several Ubuntu ISO's onto that disk
- Snapshotted the "alpah-second" disk
- Removed all the ISO's
- Halted "alpha"
- Rolled back the snapshot
- Started "alpha" again
- Mounted the disk
I then found out that all the ISO's were still gone, it seems no rollback has been done at all.
The filesystem mounted without errors or whatsoever, so it seems that the rollback didn't do anything.
root@client01:~# virsh start alpha Domain alpha started root@client01:~# rbd ls alpha alpha-second beta charlie root@client01:~# rbd snap create --snap=alpha-second-snap alpha-second root@client01:~# rbd snap ls alpha-second 6 alpha-second-snap 10737418240 root@client01:~# time rbd snap rollback --snap=alpha-second-snap alpha-second real 1m40.516s user 0m0.460s sys 0m0.520s root@client01:~# virsh start alpha Domain alpha started root@client01:~#
#5 Updated by Wido den Hollander about 9 years ago
- Status changed from Testing to Closed
Just spoke to Yehudasa, but this seems to be a synchronization problem between RBD and qemu-rbd
When using qemu-rbd you should shutdown the VM first before creating the snapshot via the "rbd" tool, or use "qemu-img" to use the snapshot.
Also, it's recommended to do the rollback while the VM is down.