Project

General

Profile

Actions

Bug #1778

closed

Error after installing an iso-image via qemu / rbd-image

Added by Oliver Francke over 12 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
qemu
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi *,

we are currently running:

ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) from deb-pkg: Version: 0.38-1~bpo60+1

on Linux-kernel 3.1.2 with neccessary modules loaded.

Further setup:

- QEMU emulator version 0.15.94, Copyright (c) 2003-2008 Fabrice Bellard

With some running VM's we have evaluated the snapshot-capability, which worked like
a charme managed via the qemu monitoring-port ( savevm, loadvm...). So far, so very
good.

And now to the paragraph containing the long expected "But":
But, as of a fresh os-install into a rbd-image from for example an
Ubuntu-10.04.1-server-amd64.iso
everything went fine up to a complete stop/start.
No disk was found. After some re-installs we have a very early failure even
after an "rbd-map" of the image:
"rbd0: unknown partition table"
which went away after not starting the VM with the "snapshot=on" parameter for the
corresponding drive and reinstalling the same iso.
So, our conclusion leads to s/t concerning the snapshot-functionality reporting some
false geometry-values to fdisk n grub n friends, or wrong start-cyls which should be
after snapshot-meta-data but are not?! ( just guessing)

Well, as we have no other clues right now, please sched some light in ;)
We are of course willing to assist with any debugging.

Regards and thnx in@vance,

Oliver.


Files

519.log.gz (272 KB) 519.log.gz Oliver Francke, 12/02/2011 05:31 AM
gdb.sess.log (14.2 KB) gdb.sess.log Oliver Francke, 12/06/2011 02:14 AM
Actions #1

Updated by Josh Durgin over 12 years ago

You don't need any special qemu options to use snapshots - the snapshot option is confusingly named. The qemu 'snapshot=on' parameter doesn't enable snapshots - it causes qemu to write to a temporary file instead of the actual disk image (rbd in this case). According to the man page, C-a s will make the data actually be written to the backend, but I haven't tested this with rbd. You don't need any special options to be able to use snapshots.

Actions #2

Updated by Oliver Francke over 12 years ago

Hi Josh,

at least my experience showed a different behaviour: no reliable snapshots and even crashes of qemu-system*-processes after "loadvm/savevm" in the qemu-monitor. Snapshots with complete rollback only after "-drive... rbd:...,snapshot=on", sorry. Snapshots visable within monitor and rbd-command. Was implicit perhaps with qemu0.15.[01] running on another node, no problems there.
We are running qemu-1.0rc4 as of my writing. Without param no problems installing a distro, with an invalid partition-table occurs. So, there is a difference.

Thnx for you attention

Oliver.

Actions #3

Updated by Josh Durgin over 12 years ago

There's certainly a difference with the snapshot parameter - it doesn't store anything in the rbd image unless you use the C-a s command. It only stores data in a temporary file, so the rbd image will be blank, resulting in the invalid partition error message. The snapshot parameter has nothing to do with rbd (or qcow2 or other disk format) snapshots. Are you using C-a s to save the data to rbd?

Right now rollback and deletion of snapshots aren't wired up in the qemu rbd driver (#390), you have to use the rbd command to do that.

For the loadvm/savevm problems, is this without snapshot=on? If so, can you add extra options to the -drive rbd section: 'rbd:pool/image:debug_rbd=20:log_to_stderr=2' and post the logs and commands leading up to the problem?

Thanks!

Actions #4

Updated by Oliver Francke over 12 years ago

Hi Josh,

I have just made a session with savevm/loadvm, once without/with the snapshot-option, now with qemu-1.0. Without, the vm freezes, with, I can "see" the snapshots within monitor-cli via "info snapshots" and via rbd <pool>/<filename>. Everything as expected.
I will come back with a logfile with your debug-switches later. What I don't get is your comment regarding the "invalid partition".
Here my worklog:

0. rados mkpool foo, rbd --size ... create foo/bar.rbd
1. I start a VM with "snapshot=on"-parameter
2. I take the -drive as "/dev/vda" to be partitioned with a debian/ubuntu/suse/whatever distro
3. After complete install - and so even right after partitioning, too - install results in a
"unknown partition table".
4. sigh

If I leave out (1.), everything completes to my expectation. This is the more interesting
point. I did not do any snapshots while installation, of course.

Kind regards,

Oliver.

Actions #5

Updated by Oliver Francke over 12 years ago

Well Josh,

attached you will find a crash, qemu-system... started without "-daemonize" to see what's going on ;-)

Hope it helps,

Oliver.

Actions #6

Updated by Josh Durgin over 12 years ago

Hi Oliver,

With snapshot=on data is never saved to the backing device - the original file is not modified unless you use that special shortcut (although I'm not even sure that works - I couldn't get it working on a regular raw disk image). When you're using snapshot=on, your install is stored in a hidden file in $TMP_DIR, but never written to rbd. If you try it with a raw file instead of rbd I think you'll see the same problem.

When you create an rbd image, it is treated as all zeroes until data is written to it. With snapshot=on, rbd doesn't get written to, so the bios just reads all zeroes, which is not a valid partition table.

Thanks for the crash log, it looks like there's some race condition (maybe only with buffered io?) during shutdown. If you could get a core file ('ulimit -c unlimited' before reproducing the crash), then open that up in gdb and post the backtrace that'd be great. I'm not sure yet if the problem is librbd or the way qemu uses it.

Actions #7

Updated by Oliver Francke over 12 years ago

Well Josh,

being quite busy... and need to understand ( not a "real-coder" these days anymore ;-) ) how to configure correctly producing some useful gdb-output, you find some file attached, tough I still don't get the warning:
"warning: core file may not match specified executable file."
Please let me know, if and how you need more debugging.

And, last-but-not-least, let me know, how snapshotting should be done these days... I'm lost a bit TBH, I would appreciate a way to have it scriptable, no problem to handle the monitor-socket, rbd and other tools.

Thnx for taking the time and regards,

Oliver.

Actions #8

Updated by Josh Durgin over 12 years ago

Hi Oliver,

That gdb session is actually an entirely different crash - I'll take a closer look at both of these today. The backtrace you got for the second one may be good enough to figure it out. To make the backtraces more useful, you need to install debugging symbols (in librados2-dbg and librbd1-dbg packages) on the machine where you're running qemu. I'd still like a backtrace from the crash you saw when you were using the rbd_writeback_window, if possible. Just doing a gdb bt is enough to start with - from there I can ask for more specific info if needed.

The rbd command line tool has all the snapshot functionality in it, so it's what I recommend. The easiest way to create a snapshot is 'rbd snap create --snap <snapname> pool/image'. Qemu's savevm will do the same thing, but for all the disks in a guest (which you might not want, for example if one is swap).

Note that to make sure the guest filesystem is in a consistent state (doesn't need fsck) when you take a snapshot you need to use the fsfreeze command on the filesystem(s) you're snapshotting, or unmount them. Qemu has a guest agent to make fs freezing and thawing scriptable.

Thanks again for the logs!

Actions #9

Updated by Josh Durgin over 12 years ago

  • Assignee set to Josh Durgin
Actions #10

Updated by Josh Durgin over 12 years ago

  • Category set to qemu

The bug is in the qemu driver - the fix is in our qemu repo. I think this might explain both your crashes, but if you see any more, let us know.

Actions #11

Updated by Josh Durgin over 12 years ago

  • Status changed from New to 7
Actions #12

Updated by Oliver Francke over 12 years ago

Hi Josh,

well, the small fix does it, no more crashes.

But, of course I would love to have back my live-snapshot-capability. Online-snapshot, offline is working with rbd as you wrote. Is there anything I overlooked? If not, s/t in the pipeline?
Cause with our "other" setup with nfs-storage live-snapshots are working.

Thnx for your efforts so far,

Oliver.

Actions #13

Updated by Josh Durgin over 12 years ago

  • Status changed from 7 to Resolved

Hi Oliver,

You can use rbd to take live snapshots with the same consistency as with snapshotting images on nfs. The issue is with consistency of the vm's filesystem, which has it's own caches that need to be flushed to get a consistent snapshot. The storage system the vm is running on simply doesn't know anything about the caching in the layers above it. If you're using nfs to store the vm image files, you would have the same issue. If this isn't a problem for you, then you can not worry about the fsfreeze bit I mentioned before - just be aware that your vm may have an inconsistent filesystem after rollback, requiring a fsck.

Actions

Also available in: Atom PDF