Project

General

Profile

Bug #4388

rbd import broken

Added by Corin Langosch about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
bobtail
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Im tried to import a vm image (10 GB) into a bobtail 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5) cluster. However when booting the vm many fs errors were reported and the vm didn't start.

I then stopped the vm, deleted the rbd image and imported again. I then exported it again and checked the md5sums - they were different. I repeated this process several times, trying format1 and format2 and even different pools (hdd and ssd). When I suppiled the same options to the import it always returned the same, wrong md5sum. So it doesn't seem like a hardware problem, otherwise I would expect different md5sums for the same import options. One interesting point is that using the same import options on a different host leads to different md5sums. This was fully reproducable.

I finally ended up writing my own little script which uses librdb and imports the image using 4MB chunks. This worked fine, the testing exports returned the correct md5 and the vm boots :)

But there must be some really serious bug in rbd import.

Here's what I did:

md5sum vm.img
a7851dd0b22cb829833e40237d64af3f vm.img

rbd import --format 2 vm.img clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9
Importing image: 100% complete...done.

rbd export clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9 bb.img
Exporting image: 100% complete...done.

md5sum bb.img
411a701ffeb880b3268e74368da3488c bb.img

rbd rm clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9
Removing image: 100% complete...done.

rbd import --format 2 vm.img clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9
Importing image: 100% complete...done.

rbd export clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9 bb.img
Exporting image: 100% complete...done.

md5sum bb.img
411a701ffeb880b3268e74368da3488c bb.img

rbd rm clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9
Removing image: 100% complete...done.

rbd import --format 1 vm.img clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9
Importing image: 100% complete...done.

rbd export clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9 cc.img
Exporting image: 100% complete...done.

md5sum cc.img
411a701ffeb880b3268e74368da3488c cc.img

rbd rm clusterx-ssd/e0df798d-969e-405e-bd93-ba7da1353df9
Removing image: 100% complete...done.

rbd import --format 2 vm.img clusterx-ssd/e0df798d-969e-405e-bd93-ba7da1353df9
Importing image: 100% complete...done.

rbd export clusterx-ssd/e0df798d-969e-405e-bd93-ba7da1353df9 ssd2.img
Exporting image: 100% complete...done.

md5sum ssd2.img
411a701ffeb880b3268e74368da3488c ssd2.img


on another host, running the exact same configuration and versions of ceph:

md5sum vm.img
a7851dd0b22cb829833e40237d64af3f vm.img

rbd import --format 2 vm.img clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9
Importing image: 100% complete...done.

rbd export clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9 bb.img
Exporting image: 100% complete...done.

md5sum bb.img
3ec31398a6f6966a15f3a138250dd641 bb.img

rbd rm clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9
Removing image: 100% complete...done.

rbd import --format 2 vm.img clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9
Importing image: 100% complete...done.

rbd export clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9 cc.img
Exporting image: 100% complete...done.

md5sum cc.img
3ec31398a6f6966a15f3a138250dd641 cc.img

rbd info clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9
rbd image 'e0df798d-969e-405e-bd93-ba7da1353df9':
size 10240 MB in 2560 objects
order 22 (4096 KB objects)
block_name_prefix: rbd_data.33562ae8944a
format: 2
features: layering, striping
stripe unit: 4096 KB
stripe count: 1

Associated revisions

Revision 30912838 (diff)
Added by Josh Durgin about 11 years ago

rbd: remove fiemap use from import

On some kernels and filesystems fiemap can be racy and provide
incorrect data even after an fsync. Later we can use SEEK_HOLE and
SEEK_DATA, but for now just detect zero runs like we do with stdin.

Basically this adapts import from stdin to work in the case of a file
or block device, and gets rid of other cruft in the import that used
fiemap.

Fixes: #4388
Backport: bobtail
Signed-off-by: Josh Durgin <>

Revision 7fbc1ab6 (diff)
Added by Josh Durgin about 11 years ago

rbd: remove fiemap use from import

On some kernels and filesystems fiemap can be racy and provide
incorrect data even after an fsync. Later we can use SEEK_HOLE and
SEEK_DATA, but for now just detect zero runs like we do with stdin.

Basically this adapts import from stdin to work in the case of a file
or block device, and gets rid of other cruft in the import that used
fiemap.

Fixes: #4388
Backport: bobtail
Signed-off-by: Josh Durgin <>
(cherry picked from commit 3091283895e8ffa3e4bda13399318a6e720d498f)

History

#1 Updated by Josh Durgin about 11 years ago

Could you provide a few more details on your setup:

- what kernel and distro?
- what fs are the files being imported on?
- does importing from stdin have the same problem? (e.g. rbd import --format 2 - pool/image < file)

Changing format wouldn't affect the import process, although with format 2 changing --stripe-unit and --stripe-count might.

#2 Updated by Corin Langosch about 11 years ago

Distro: Ubuntu quantal
Kernel: 3.7.10-030710-generic #201302271235 SMP Wed Feb 27 17:36:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux (from ubuntu mainline kernel repos)
Filesystem: XFS

I didn't try importing from stdin, but I can do that in the next few hours and report back.

#3 Updated by Corin Langosch about 11 years ago

Import from stdin seems to work fine. I tried it twice, both times it gave the correct checksum. Just to make sure I tried again using importing from the file and this returned again the same wrong md5sum as before. So probably some file hole handling bug?

#4 Updated by Josh Durgin about 11 years ago

Yes, this probably means fiemap is broken on xfs with that kernel. The image file you're importing wasn't just written, was it?
'rbd import' does do an fsync before the fiemap, but under heavy write workloads fiemap has still had problems on a just-written and fsync'd file before.

#5 Updated by Sage Weil about 11 years ago

we should switch to using SEEK_HOLE and SEEK_DATA.. that is supposed to work regardless of sync state.

i thought, tho, that we were doing hole detection by looking for zeros? maybe not when reading from a file instead of stdin.

#6 Updated by Dan Mick about 11 years ago

Yeah, I did that for stdin but left the file algorithm alone.

#7 Updated by Corin Langosch about 11 years ago

@Josh: are you sure? I mean I'm using a stable kernel versuion, not an rc or any custom build. XFS is known for it's stability and all systems work stable, so I'd really be surprised if this is a xfs/ kernel issue. I googled a bit around but couldn't find any bug reports related to xfs and fiemap issues. Can you point me to any commits/ bug reports related to this? Regarding your question: the image was copied from another datacenter using rsync a few minutes before. But as I did the import several times (5 times at least) I'd expect that it's already been synced long before the last import. The host itself was completely idle at that time, so no (heavy) cpu/io load.

@Sage: looking for zeros seems to be the best imo. I guess a lot of people copy files around and don't care about passing special sparse file options (for example to rsync) to their tools which are needed to preserve the holes and so they'd endup with lots more data in ceph than would actually be needed.

#8 Updated by Corin Langosch about 11 years ago

@Josh To make sure it's no kernel bug I just upgraded this single host to the latest mainline stable kernel (3.8.2-030802-generic). Then i imported the file using "rbd import --format 2 vm.img clusterx-hdd/e0df798d-969e-405e-bd93-ba7da1353df9" and exported it again. The exported file has again the same wrong md5sum as with kernel 3.7.10.

#9 Updated by Ian Colle about 11 years ago

  • Assignee set to Josh Durgin

#10 Updated by Josh Durgin about 11 years ago

I'm pretty sure it's fiemap since that's almost the only difference between importing from a file and importing from stdin.

Although xfs and ext* are generally pretty stable, everything has bugs. fiemap in particular has been known to cause issues in the past, and isn't really well-tested by anything that I'm aware of. We found this out the hard way before when it behaved incorrectly on all fses we tried under a heavy write load on the osds, so we don't use it on the osds anymore. SEEK_HOLE/SEEK_DATA is a newer interface that has tests in xfstests and should be less prone to races.

In the short term, it's easiest to do our own zero detection (which we already do for import from stdin), and use SEEK_HOLE/SEEK_DATA later.

#11 Updated by Corin Langosch about 11 years ago

Well you know the ceph sources better for sure. It's only that I'm quite surprised that such a grave bug (it causes corruption...) should be known but still be present in the latest stable kernel versions.

Anyway, probably it'd be a good idea to put a warning in the ceph docs (or even when running rados import ...) then?

#12 Updated by Josh Durgin about 11 years ago

  • Status changed from New to Fix Under Review
  • Backport set to bobtail

The wip-rbd-import branch removes the fiemap usage on top of the 'next' branch. I think it should be backported to bobtail as well.
We should probably warn about this issue in the release notes at least.

#13 Updated by Sage Weil about 11 years ago

  • Status changed from Fix Under Review to Resolved

commit:35ab2a4189103abc25035a489a74b8261e9317c2

Also available in: Atom PDF