Project

General

Profile

Actions

Bug #391

closed

snap create/delete caused corruption

Added by Andrew F over 13 years ago. Updated almost 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Yesterday, I created and deleted a RBD snapshot; this morning, I was greeted by a number of dire warnings in dmesg:

[1795870.647296] attempt to access beyond end of device
[1795870.648590] vda: rw=0, want=14413400712, limit=20971520

as well as all sorts of IO errors. Upon shutting down the VM and mounting it in another instance, though, the errors disappeared.

I haven't tried to reproduce this yet, so I'm not sure if the snapshot is what really caused it, but it seems likely.

Actions #1

Updated by Sage Weil over 13 years ago

  • Status changed from New to Can't reproduce

this is old

Actions #2

Updated by Sage Weil over 13 years ago

  • Project changed from 3 to 6
  • Category deleted (9)
Actions #3

Updated by Yehuda Sadeh about 13 years ago

  • Status changed from Can't reproduce to In Progress

reopening

Actions #4

Updated by Andrew F about 13 years ago

Managed to reproduce this by running this script (to create and delete snapshots) on the host:

#!/bin/bash
set -e
guest=$1
shift
disks="$*" 
if [ -z $guest ]; then
        echo "Need a guest name" 
        exit 1
fi
while true ; do
        date
        echo virsh snapshot-create $guest
        out="$( virsh snapshot-create $guest )" 
        echo "snap: $out" 
        snap=$( echo $out | egrep -o '[0-9]+' )
        sleep 1
        echo virsh snapshot-delete $guest $snap
        virsh snapshot-delete $guest $snap
        sleep 1
        for d in $disks ; do
                echo rbd snap rm $d --snap $snap
                rbd snap rm $d --snap $snap
        done
        echo
done

while disk activity was present on the guest. As little as three cycles of snapshot creation/destruction is sufficient to cause filesystem corruption.

Actions #5

Updated by Josh Durgin about 13 years ago

  • Assignee set to Josh Durgin
Actions #6

Updated by Andrew F about 13 years ago

Using ext3freezer on the guest during the snapshotting doesn't help either — as far as I can tell, simply taking/removing the snapshot is enough to cause corruption.

Actions #7

Updated by Josh Durgin about 13 years ago

I haven't been able to reproduce this with the latest ceph and qemu-rbd. I'd like to upgrade the kvmtest cluster and see if it can be reproduced there.

Side note: virsh snapshot-delete does nothing (#390)

Actions #8

Updated by Andrew F about 13 years ago

I haven't been able to reproduce this with the latest ceph and qemu-rbd. I'd like to upgrade the kvmtest cluster and see if it can be reproduced there.

Were you able to reproduce it with the versions running on kvmtest? If so, go ahead and upgrade... if not, you might just have not been managing to tickle the bug. The corruption isn't always to metadata, so it may help to look at the content integrity as well as fsck.

Side note: virsh snapshot-delete does nothing (#390)

Right, hence the manual rbd snap rm.

Actions #9

Updated by Yehuda Sadeh about 13 years ago

oh.. missed the external rbd tool call. That might have caused the problem, as doing it while there's a running vm is a probable cause for corruption. The whole purpose of the newer version is to allow that (by using a new notification mechanism).

Actions #10

Updated by Sage Weil over 12 years ago

  • Status changed from In Progress to Closed
Actions #11

Updated by Sage Weil almost 12 years ago

  • Project changed from 6 to rbd
Actions

Also available in: Atom PDF