Bug #391
closed
snap create/delete caused corruption
Added by Andrew F over 13 years ago.
Updated almost 12 years ago.
Description
Yesterday, I created and deleted a RBD snapshot; this morning, I was greeted by a number of dire warnings in dmesg:
[1795870.647296] attempt to access beyond end of device
[1795870.648590] vda: rw=0, want=14413400712, limit=20971520
as well as all sorts of IO errors. Upon shutting down the VM and mounting it in another instance, though, the errors disappeared.
I haven't tried to reproduce this yet, so I'm not sure if the snapshot is what really caused it, but it seems likely.
- Status changed from New to Can't reproduce
- Project changed from 3 to 6
- Category deleted (
9)
- Status changed from Can't reproduce to In Progress
Managed to reproduce this by running this script (to create and delete snapshots) on the host:
#!/bin/bash
set -e
guest=$1
shift
disks="$*"
if [ -z $guest ]; then
echo "Need a guest name"
exit 1
fi
while true ; do
date
echo virsh snapshot-create $guest
out="$( virsh snapshot-create $guest )"
echo "snap: $out"
snap=$( echo $out | egrep -o '[0-9]+' )
sleep 1
echo virsh snapshot-delete $guest $snap
virsh snapshot-delete $guest $snap
sleep 1
for d in $disks ; do
echo rbd snap rm $d --snap $snap
rbd snap rm $d --snap $snap
done
echo
done
while disk activity was present on the guest. As little as three cycles of snapshot creation/destruction is sufficient to cause filesystem corruption.
- Assignee set to Josh Durgin
Using ext3freezer on the guest during the snapshotting doesn't help either — as far as I can tell, simply taking/removing the snapshot is enough to cause corruption.
I haven't been able to reproduce this with the latest ceph and qemu-rbd. I'd like to upgrade the kvmtest cluster and see if it can be reproduced there.
Side note: virsh snapshot-delete does nothing (#390)
I haven't been able to reproduce this with the latest ceph and qemu-rbd. I'd like to upgrade the kvmtest cluster and see if it can be reproduced there.
Were you able to reproduce it with the versions running on kvmtest? If so, go ahead and upgrade... if not, you might just have not been managing to tickle the bug. The corruption isn't always to metadata, so it may help to look at the content integrity as well as fsck.
Side note: virsh snapshot-delete does nothing (#390)
Right, hence the manual rbd snap rm.
oh.. missed the external rbd tool call. That might have caused the problem, as doing it while there's a running vm is a probable cause for corruption. The whole purpose of the newer version is to allow that (by using a new notification mechanism).
- Status changed from In Progress to Closed
- Project changed from 6 to rbd
Also available in: Atom
PDF