Bug #278: data corruption after rbd rollback - Ceph - Ceph

Actions

Copy link

Bug #278

closed

data corruption after rbd rollback

Added by Yehuda Sadeh almost 14 years ago. Updated over 13 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Spent time:

2:00 h

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Andrew created a kvm-rbd image, took a snapshot (from the rbdtool while the rbd-kvm was running, however, I don't think the specific corruption we see is related). Following that he stopped the kvm and did a rollback to that snapshot. The first 128MB of the image are now empty.

Looking at the logs we see a write on the object that completed (empty snapc, hence can be regarded as prior to the snapshot):
10.07.13_18:34:56.295916 7fb39b6c1910 -- 10.14.0.103:6800/14259 --> 10.14.0.108:0/5242 -- osd_op_reply(3099 rb.0.1.000000000000 [write 2633728~4096] = 0) v1 -- ?+0 0x227ea80

rollback request:

10.07.13_18:35:12.430366 7fb39a6bf910 -- 10.14.0.103:6800/14259 <== client9475 10.14.0.108:0/5648 1 ==== osd_op(client9475.0:3 rb.0.1.000000000000 [rollback 2] 21.f76d snapc 2=[2]) v1 ==== 135+0+0(867103697 0 0) 0x253fbb0

for which we see this:

10.07.13_18:35:12.430802 7fb398dbb910 osd2 663 pg[21.5( v 663'34214 (663'34212,663'34214] n=343 ec=660 les=661 660/660/660) [2,4] r=0 mlcod 663'34213 active+clean] _rollback_to deleting head on rb.0.1.000000000000 because got ENOENT on find_object_context

Files

Download all files

osd.2.gz (1.31 MB) osd.2.gz		Yehuda Sadeh, 07/14/2010 01:27 PM
Screenshot-Untitled_Window.png (11.3 KB) Screenshot-Untitled_Window.png	VNC terminal Virtual Machine	Wido den Hollander, 08/31/2010 06:28 AM
filelist.txt (1.25 KB) filelist.txt	Filelist after rollback	Wido den Hollander, 09/01/2010 01:44 AM

Actions

Copy link

Updated by Sage Weil almost 14 years ago

Status changed from New to Resolved

fixed by e8991f19526939ee843c7b04c167fe290f113602

Actions

Copy link

Updated by Wido den Hollander over 13 years ago

File Screenshot-Untitled_Window.png Screenshot-Untitled_Window.png added
Status changed from Resolved to 7

I'm setting this back to "Testing" since i'm seeing this too.

root@client01:~# rbd snap create --snap=charlie001 charlie
root@client01:~# rbd snap ls charlie
4    charlie001    53687091200

< DELETE ALL DATA INSIDE VM >

root@client01:~# virsh destroy charlie
Domain charlie destroyed

root@client01:~# time rbd snap rollback --snap=charlie001 charlie

real    9m26.463s
user    0m2.240s
sys    0m2.170s
root@client01:~# virsh start charlie
Domain charlie started

root@client01:~#

After creating the snapshot I did a "rm -rf /*" in the virtual machine.

As you can see, the VM's data got corrupted.

I'm not sure which logfile to look into.

Actions

Copy link

Updated by Wido den Hollander over 13 years ago

File filelist.txt filelist.txt added

I just exported the "charlie" image and mounted it through a loop device.

Attached is a filelist which I got from the VM.

A df -h shows me:

root@logger:/mnt/loop1# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.2G  3.7G  5.1G  43% /
none                  2.0G  176K  2.0G   1% /dev
none                  2.0G     0  2.0G   0% /dev/shm
none                  2.0G   72K  2.0G   1% /var/run
none                  2.0G     0  2.0G   0% /var/lock
none                  2.0G     0  2.0G   0% /lib/init/rw
/dev/mapper/data-logs
                      500G  121G  380G  25% /srv/ceph
/dev/mapper/loop0p1    48G  180M   45G   1% /mnt/loop1
root@logger:/mnt/loop1#

To me it seems that the rollback didn't do anything at all, the data is still erased.

Searching through the OSD logs I didn't see anything about a "rollback", but this could be due to my low debug level.

Actions

Copy link

Updated by Wido den Hollander over 13 years ago

Did another test:

Created a second disk for the VM "alpha"
Formatted the disk with ext4 inside the VM
Mounted the VM
Downloaded several Ubuntu ISO's onto that disk
Snapshotted the "alpah-second" disk
Removed all the ISO's
Halted "alpha"
Rolled back the snapshot
Started "alpha" again
Mounted the disk

I then found out that all the ISO's were still gone, it seems no rollback has been done at all.

The filesystem mounted without errors or whatsoever, so it seems that the rollback didn't do anything.

root@client01:~# virsh start alpha
Domain alpha started

root@client01:~# rbd ls
alpha
alpha-second
beta
charlie
root@client01:~# rbd snap create --snap=alpha-second-snap alpha-second
root@client01:~# rbd snap ls alpha-second
6    alpha-second-snap    10737418240
root@client01:~# time rbd snap rollback --snap=alpha-second-snap alpha-second

real    1m40.516s
user    0m0.460s
sys    0m0.520s
root@client01:~# virsh start alpha
Domain alpha started

root@client01:~#

Actions

Copy link