Project

General

Profile

Actions

Bug #10116

closed

Ceph vm guest disk lockup when using fio

Added by Brad House over 9 years ago. Updated almost 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When running a disk benchmark within a guest, I'm getting a disk lockup that doesn't ever appear to resolve itself. This issue ONLY happens when running the VM that resides on RBD, when using a local disk instead, the guest does not lock up.

Here is the command I run in the guest:

fio --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=100 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest --size=128m

The lockup also occurs when using other ioengines like sync.

However, using a lower "numjobs" such as 4 works fine with a result like:

  read : io=524288KB, bw=49818KB/s, iops=12454, runt= 10524msec

I'm using qemu 2.1.2 (and have also tried qemu 1.7 with the same results). Both ceph firefly and ceph giant have the issue.

Now, the odd part is I can only reproduce this issue on the new production gear we are QAing. Our test lab cannot reproduce this issue. However, the test lab equipment is much slower and only uses 1Gb networking gear, the new production uses 10Gb SFP+ direct attach for low latency and high performance.

The environment is proxmox 3.3 (Debian Wheezy 64bit), and it is passing this as the command line option for qemu:

-drive file=rbd:ssd/vm-101-disk-1:mon_host=ceph1 ceph2 ceph3:id=admin:auth_supported=cephx:keyring=/etc/pve/priv/ceph/ssd.keyring,if=none,id=drive-virtio0,aio=native,cache=none,detect-zeroes=on

I'm not even sure where to begin. Everything else looks great, and performance is great. I haven't yet reproduced a lockup outside of the fio benchmark, but I'm sure its possible.


Files

gdb1.txt (14.5 KB) gdb1.txt Brad House, 11/17/2014 09:40 AM
gdb2.txt (16.4 KB) gdb2.txt Brad House, 11/17/2014 09:40 AM
gdb3.txt (16.8 KB) gdb3.txt Brad House, 11/17/2014 09:40 AM
blktrace_qemu.tar.xz (2.92 MB) blktrace_qemu.tar.xz Brad House, 11/17/2014 11:05 AM
blktrace-11182014.dat.xz (6.24 MB) blktrace-11182014.dat.xz Brad House, 11/18/2014 05:27 AM
gdb_qemu_debug.txt (13.9 KB) gdb_qemu_debug.txt Brad House, 11/18/2014 12:30 PM
gdb_next_lockup.txt (19.4 KB) gdb_next_lockup.txt Brad House, 11/18/2014 12:41 PM
gdb_lockup_3.txt (19.5 KB) gdb_lockup_3.txt Brad House, 11/18/2014 12:43 PM
gdb_consecutive_batch_dump.txt (15.2 KB) gdb_consecutive_batch_dump.txt Brad House, 11/18/2014 12:49 PM
Actions

Also available in: Atom PDF