Project

General

Profile

Actions

Bug #3286

closed

librbd, kvm, async io hang

Added by Chris Dunlop over 11 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
librbd
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Fio hangs in a linux-2.6.32 vm on librbd when using direct and libaio, with ceph at c8721b956.

Libvirt configured with direct rbd access like:

  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw'/>
      <auth username='test1'>
        <secret type='ceph' uuid='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'/>
      </auth>
      <source protocol='rbd' name='rbd/test1:rbd_cache=1:rbd_cache_size=67108864'>
        <host name='10.200.1.1' port='6789'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

w/ an image of Debian Squeeze, kernel 2.6.32-5-amd64.

Using fio config:

# egrep -v '^(#|$)' /usr/share/doc/fio/examples/iometer-file-access-server
[global]
description=Emulation of Intel IOmeter File Server Access Pattern
[iometer]
bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10
rw=randrw
rwmixread=80
direct=1
size=4g
ioengine=libaio
iodepth=64

...fio hangs consistently (tested 3 times with reboots between):

# fio /usr/share/doc/fio/examples/iometer-file-access-server
Jobs: 1 (f=1): [m] [6.6% done] [0K/0K /s] [0/0 iops] [eta 08m:05s]
    ### hung here

...whereas this completes successfully:

fio <(sed -e's/direct=1/direct=0/' -e's/ioengine=libaio/ioengine=sync/' /usr/share/doc/fio/examples/iometer-file-access-server)

Files

qemu.log (68.3 MB) qemu.log Chris Dunlop, 10/10/2012 06:55 PM
strace.1543 (1.53 MB) strace.1543 Chris Dunlop, 10/11/2012 05:41 PM
strace.1544 (5.56 MB) strace.1544 Chris Dunlop, 10/11/2012 05:41 PM
strace.1548 (64 KB) strace.1548 Chris Dunlop, 10/11/2012 05:41 PM
strace.1550 (5.59 MB) strace.1550 Chris Dunlop, 10/11/2012 05:41 PM
qemu.log.bz2 (2.96 MB) qemu.log.bz2 Chris Dunlop, 10/11/2012 07:39 PM
blktrace.dat.bz2 (1.53 MB) blktrace.dat.bz2 Chris Dunlop, 10/29/2012 03:27 PM
qemu.log.bz2 (1.83 MB) qemu.log.bz2 Chris Dunlop, 10/29/2012 03:27 PM
Actions #1

Updated by Josh Durgin over 11 years ago

Is your cluster entirely healthy?

If so, could you add :debug_rbd=20:debug_ms=1:log_file=/path/writeable/by/qemu.log to the libvirt rbd config and attach the resulting log file (you might need to explicitly allow qemu to write to that file in apparmor)?

Actions #2

Updated by Chris Dunlop over 11 years ago

# ceph health
HEALTH_OK

qemu.log attached, after this sequence:

# virsh dumpxml smhtest1 | grep rbd
      <source protocol='rbd' name='rbd/smhtest1:rbd_cache=1:rbd_cache_size=67108864:debug_rbd=20:debug_ms=1:log_file=/tmp/qemu.log'>
# virsh start smhtest1
# ssh smhtest1
# cd /tmp
# time fio --output /tmp/fio1.out /usr/share/doc/fio/examples/iometer-file-access-server
Jobs: 1 (f=1): [m] [9.9% done] [0K/0K /s] [0/0 iops] [eta 05m:19s]
    #### hung here

In another window:

# ssh smhtest1
# ps lw $(pidof fio)
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0     0  1604  1588  20   0  28912 13048 -      D+   pts/0      0:35 fio --output /tmp/fio1.out /usr/share/doc/fio/examples/iometer-file-access-server
1     0  1605  1604  20   0  36544  5892 -      Ds   ?          0:00 fio --output /tmp/fio1.out /usr/share/doc/fio/examples/iometer-file-access-server
# date
Thu Oct 11 12:42:23 EST 2012
# uptime
 12:42:23 up 21 min,  2 users,  load average: 9.78, 8.21, 5.28

Actions #3

Updated by Josh Durgin over 11 years ago

All the IOs in the log completed successfully, so I'm not sure what's causing these hangs. Does this work with the same guest using a raw image file instead of rbd?

Actions #4

Updated by Chris Dunlop over 11 years ago

Yes, it works when using a raw image file.

Actions #5

Updated by Josh Durgin over 11 years ago

So we'll need to dig deeper to figure out exactly what is hanging. There's a couple general possibilities - a deadlock in qemu/librbd, or some kind of hang in the guest. For the first, could you attach to the qemu process with gdb after the hang occurs, and post the output of 'thread apply all bt'?

The second possiblity is trickier to debug. Since the kernel page cache is being used, the actual block device requests are separate from the syscalls from fio.
Could you use blktrace in the guest to track what the underlying I/Os to the rbd device are from the guest's perspective?

Actions #6

Updated by Josh Durgin over 11 years ago

Err, I read that second part backwards - the hanging case is skipping the page cache, so an strace of the fio in the guest might be helpful.

Updated by Chris Dunlop over 11 years ago

Reducing the amount of data fio is testing with produces different behaviour. The default is size=4g. Below about size=100m I haven't seen it fail. Around 100m the success/failure is inconsistent. Using:

strace -o /tmp/strace -ff -tt fio --output /tmp/fio.out <(sed -e 's/size=4g/size=100m/' /usr/share/doc/fio/examples/iometer-file-access-server)

...the attached strace.15434 are from a successful run, and strace.15{48,50} from an unsuccessful run immediately afterwards. Unfortunately it looks like the interesting bit are stuck in strace's output buffer as strace.1550 is truncated.

Hmmm, I just rebooted the virtual and did the same tests. Once again, with size=100m, the first run succeeded and the second run failed.

After another reboot, I ran with size=10m 50 times without failure, then without a reboot, size=50m 30 times ok. Then, with size=90m, it got stuck with the % done oscillating from 86% up to 88% and the eta increasing. At that point the fio process was unkillable. I rebooted after several minutes of the same behaviour (the test would normally complete in a matter of seconds).

After a reboot, running with size=80m succeeded the first time and hung the second.

Hmmm, rdb_cache_size is set to 64m. Perhaps something to do with that? Another reboot, and ran this script:

for ((s=60; ; s+=1))
do
  for ((i=1; i<=10; i+=1))
  do
    echo -e "\ns=$s i=$i" 
    fio --output /tmp/fio.out <(sed -e "s/size=4g/size=${s}m/" /usr/share/doc/fio/examples/iometer-file-access-server)
  done
done

This hung at "s=75 i=4". Another reboot, and it hung at "s=79 i=5".

Aha! I reduced rdb_cache_size to 32m, and after a reboot the script hung immediately at "s=60 i=1".

Another reboot, and ran the script again except with the start size to "s=30". This hung at "s=49 i=1".

Another reboot and start at "s=30", and hung at "s=48 i=1".

So it looks like some issue with the rbd_cache?

Actions #8

Updated by Josh Durgin over 11 years ago

Thanks for investigating so much. It certainly sounds like the issue is with the cache. Could you get a log with debug_rbd=20:debug_ms=1:debug_objectcacher=20?

Presumably setting rbd_cache=false would make the problem go away entirely.

Actions #9

Updated by Chris Dunlop over 11 years ago

Attached, output of:

# virsh dumpxml test1 | grep rbd
      <source protocol='rbd' name='rbd/test1:rbd_cache=1:rbd_cache_size=33554433:debug_rbd=20:debug_ms=1:debug_objectcacher=20:log_file=/tmp/qemu.log'>
# virsh start test1
# ssh test1
# cd /tmp
# fio --output /tmp/fio.out <(sed -e "s/size=4g/size=60m/" /usr/share/doc/fio/examples/iometer-file-access-
server)
Jobs: 1 (f=1): [m] [100.0% done] [0K/0K /s] [0/0 iops] [eta 00m:00s] 
     ### hung here

With rbd_cache=false it's survived a few size=200m iterations, and the full size=4g is still running... will let you know how it goes.

Actions #10

Updated by Chris Dunlop over 11 years ago

FYI, 8 rounds of the full size=4g test passed without a hitch.

Actions #11

Updated by Dan Mick over 11 years ago

  • Assignee set to Dan Mick
Actions #12

Updated by Josh Durgin over 11 years ago

  • Assignee deleted (Dan Mick)

Unfortunately I haven't been able to reproduce this myself or track down the cause yet. Would it be possible to get a blktrace of the fio in addition to the debug logs from rbd during the same run?

Updated by Chris Dunlop over 11 years ago

Josh Durgin wrote:

Unfortunately I haven't been able to reproduce this myself or track down the cause yet. Would it be possible to get a blktrace of the fio in addition to the debug logs from rbd during the same run?

Attached are the blktrace.dat and qemu.log from:

host.0# virsh dumpxml test1 | grep rbd
      <source protocol='rbd' name='rbd/test1:rbd_cache=1:rbd_cache_size=67108864:debug_rbd=20:debug_ms=1:log_file=/tmp/qemu.log'>
host.0# virsh start test1
host.0# ssh test1 'blktrace -d /dev/vda1 -o -' > /tmp/blktrace.dat
host.1# ssh test1
host.2# ssh test1 'fio <(sed -e 's/size=4g/size=100m/' /usr/share/doc/fio/examples/iometer-file-access-server)'

...then when the fio hangs, I use the shell on test1 to kill -TERM the blktrace:

test1.1# kill -TERM $(pidof blktrace)
host.0# virsh destroy test1
Actions #14

Updated by Josh Durgin over 11 years ago

  • Status changed from New to 7
  • Assignee set to Josh Durgin

I fixed a few bugs that could cause this in the wip-oc-hang branch. This is now merged into the next and master branches. Could you try with the next branch on the client side? Updating librbd should be enough. Instructions for dev testing packages are here in case you don't want to build it yourself: http://ceph.com/docs/master/install/debian/#add-development-testing-packages

Actions #15

Updated by Chris Dunlop over 11 years ago

OK, with ceph next @ bc32fc42 (and rbd_cache_size=33554433) it completed the full fio test several times, in contrast to never completing previously.

I think that's got it!.

Actions #16

Updated by Josh Durgin over 11 years ago

  • Status changed from 7 to Resolved

Hooray!

Actions #17

Updated by bo zhou almost 10 years ago

I met the same problem.
Fio hangs in a linux-3.15.0 when using direct and sync.
ceph version :0.80.1(release)

Actions

Also available in: Atom PDF