Project

General

Profile

Bug #14988

QEMU VM hangs talking to RBD via librbd

Added by Roy Keene about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
other
Tags:
librbd, qemu
Backport:
hammer, infernalis
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm having a repeatable hang in QEMU (v2.4.1) talking to a Ceph (v0.95.4) RBD via librbd.

I've acquired a core file and have all the binaries compiled with -g3 -ggdb3-level of debugging information.

Attached is a backtrace for all threads.

The VM process is still awake, but QEMU will not accept() the monitor socket, no network or RBD I/O happens.

This occurs after a few hours of running on this VM. The VM guest is a Microsoft Windows 2008 R2 connected to RBD via IDE emulation in QEMU.

gdb-log.txt View (37 KB) Roy Keene, 03/05/2016 08:39 PM


Related issues

Related to rbd - Bug #13726: QEMU hangs after creating snapshot and stopping VM Resolved 11/09/2015
Copied to rbd - Backport #15032: infernalis: QEMU VM hangs talking to RBD via librbd Resolved
Copied to rbd - Backport #15033: hammer: QEMU VM hangs talking to RBD via librbd Resolved

Associated revisions

Revision 7ffbe409 (diff)
Added by Jason Dillaman about 8 years ago

test: reproducer for writeback CoW deadlock

Refs: #14988

Signed-off-by: Jason Dillaman <>
(cherry picked from commit 16b6efdd24b25ba1f6bc658681afa3d0878eb397)

Revision 621e3aef (diff)
Added by Jason Dillaman about 8 years ago

test: reproducer for writeback CoW deadlock

Refs: #14988

Signed-off-by: Jason Dillaman <>
(cherry picked from commit 16b6efdd24b25ba1f6bc658681afa3d0878eb397)
(cherry picked from commit 7ffbe4099c9367e7405e2196803a0d8b12253a06)

Revision 0d9fb7a7 (diff)
Added by Jason Dillaman about 8 years ago

test: reproducer for writeback CoW deadlock

Refs: #14988

Signed-off-by: Jason Dillaman <>

History

#1 Updated by Nathan Cutler about 8 years ago

@Roy: Can you reproduce this on 0.94.6? If not, it is probably http://tracker.ceph.com/issues/13726

#2 Updated by Roy Keene about 8 years ago

I can try with a newer Ceph, but #13726 seems to be dependant upon RBD cache being disabled but it should be enabled for this VM:

Libvirt log:


2016-03-05 03:11:20.297+0000: starting up libvirt version: 1.2.21, qemu version: 2.4.1
LC_ALL=C PATH=/bin:/usr/bin QEMU_AUDIO_DRV=none /bin/qemu-system-x86_64 -name one-80 -S -machine pc-i440fx-2.4,accel=kvm,usb=off -m 8192 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 04898cdc-6332-4688-be9f-c92ba4661790 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-80/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=rbd:rbd/one-27-80-0:auth_supported=none,if=none,id=drive-ide0-0-0,format=raw,cache=writeback -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=14,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=02:00:b2:fb:54:89,bus=pci.0,addr=0x3 -vnc 0.0.0.0:80 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -sandbox off -msg timestamp=on
qemu: terminating on signal 15 from pid 7419
2016-03-05 03:32:19.756+0000: shutting down

ceph.conf:

[client]
rbd cache = true
rbd cache size = 67108864
rbd cache max dirty age = 60

Is the caching being disabled an actual requirement for this bug, or just a coincidence for the reporter ?

I will upgrade and test but it's somewhat difficult since it's just this one VM that triggers it and it's only periodically

#3 Updated by Nathan Cutler about 8 years ago

@Roy: Upgrading is recommended anyway, but before you take that step you can try setting "rbd_non_blocking_aio" to false. If that doesn't make the problem go away, then #13726 is not at play.

#4 Updated by Jason Dillaman about 8 years ago

  • Project changed from Ceph to rbd
  • Category deleted (qemu)
Thread 17 (LWP 30339):
#0  0x00007f3d978a6378 in futex_wait (private=0, expected=0, futex_word=0x55f2de63175c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1  futex_wait_simple (private=0, expected=0, futex_word=0x55f2de63175c) at ../sysdeps/nptl/futex-internal.h:135
#2  __pthread_rwlock_wrlock_slow (rwlock=0x55f2de631750) at pthread_rwlock_wrlock.c:67
#3  0x00007f3d9cabd214 in RWLock::get_write (this=0x55f2de631748, lockdep=<optimized out>) at ./common/RWLock.h:96
#4  0x00007f3d9cab8745 in WLocker (lock=..., this=<synthetic pointer>) at ./common/RWLock.h:152
#5  librbd::ImageWatcher::reregister_watch (this=0x55f2de644240) at librbd/ImageWatcher.cc:1000
#6  0x00007f3d9ca98fda in operator() (a0=<optimized out>, this=<optimized out>)
    at /home/rkeene/devel/aurae/node/root/packages/ceph/workdir-129422584788648997066/.fakeroot/include/boost/function/function_template.hpp:767
#7  FunctionContext::finish (this=<optimized out>, r=<optimized out>) at ./include/Context.h:461
#8  0x00007f3d9ca91919 in Context::complete (this=0x7f3b5002f6b0, r=<optimized out>) at ./include/Context.h:65
#9  0x00007f3d9ca91919 in Context::complete (this=0x7f3b5001c880, r=<optimized out>) at ./include/Context.h:65
#10 0x00007f3d9cb0eb08 in Finisher::finisher_thread_entry (this=0x55f2de6447e0) at common/Finisher.cc:59
#11 0x00007f3d978a1464 in start_thread (arg=0x7f3d80ef8700) at pthread_create.c:334
#12 0x00007f3d975df01d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 16 (LWP 30332):
#0  pthread_cond_wait () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f3d9caa7bb3 in Wait (mutex=..., this=0x7f3d817f8cf0) at ./common/Cond.h:55
#2  librbd::ImageCtx::flush_cache (this=this@entry=0x55f2de6315b0) at librbd/ImageCtx.cc:675
#3  0x00007f3d9cac235e in librbd::_flush (ictx=ictx@entry=0x55f2de6315b0) at librbd/internal.cc:3276
#4  0x00007f3d9cac986e in librbd::ictx_refresh (ictx=ictx@entry=0x55f2de6315b0) at librbd/internal.cc:2121
#5  0x00007f3d9caca43e in librbd::ictx_check (ictx=ictx@entry=0x55f2de6315b0, owner_locked=owner_locked@entry=false) at librbd/internal.cc:1893
#6  0x00007f3d9cad405e in librbd::aio_write (ictx=0x55f2de6315b0, off=38234656768, len=2560,
    buf=0x7f3d5ff1a000 "33:46 fe80::990d:c01:6268:eb6d%14 POST /XXXXXXXX/XXXXXXXXXXX/XXXX/XXXXXXXXXXXXXXXX +REQID:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XX - XXXXXXXXXXXXXXXXXXXXXXXX%14 - 200 0 0 15\r\n2016-03-04 22:33:46 fe80::"..., c=0x7f3d5ff0dc40, op_flags=0) at librbd/internal.cc:3314
#7  0x00007f3d9ca91919 in Context::complete (this=0x7f3d5ff0ddf0, r=<optimized out>) at ./include/Context.h:65
#8  0x00007f3d9caae436 in _process (item=..., this=0x55f2de633a10) at ./common/WorkQueue.h:462
#9  _process (u=..., this=0x55f2de633a10) at ./common/WorkQueue.h:170
#10 ThreadPool::WorkQueueVal<std::pair<Context*, int>, std::pair<Context*, int> >::_void_process (this=0x55f2de633a10, handle=...) at ./common/WorkQueue.h:191
#11 0x00007f3d9cbb0327 in ThreadPool::worker (this=0x55f2de6334f0, wt=0x55f2de6337d0) at common/WorkQueue.cc:128
#12 0x00007f3d9cbb13e0 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:318
#13 0x00007f3d978a1464 in start_thread (arg=0x7f3d817fa700) at pthread_create.c:334
#14 0x00007f3d975df01d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

#5 Updated by Jason Dillaman about 8 years ago

@Roy,

From the backtrace, Ceph is waiting for a cache flush to complete. The thread responsible for handling cache flushes does not appear to be blocked. Something did cause the image watcher to lose its connection to the OSD, so perhaps there are some network issues at play?

If this is repeatable, would it be possible to gather the rbd log from the affected process from a period shortly before the hang to a period shortly after? I would like to see "debug rbd = 20" level. If you have the ASOK file configured, you can use "ceph --admin-daemon </path/to/admin/socket> config set debug_rbd 20" to bump the log level of a specific process.

#6 Updated by Josh Durgin about 8 years ago

  • Status changed from New to Need More Info

#7 Updated by Jason Dillaman about 8 years ago

  • Status changed from Need More Info to 12
  • Priority changed from Normal to Urgent

If an AIO request results in an image refresh due to a new snapshot creation, the cache will be flushed. When this occurs, if there is a flushed writeback request which results in a CoW copy-up from the parent image, the read will be blocked on the thread pool by the original AIO request.

Setting 'rbd_non_blocking_aio' to false should act as a workaround for the issue since the AIO request will be processed in the QEMU thread context instead of the librbd thread. Another workaround would require freezing IO before taking the snapshot to allow the writeback to complete before the snapshot is created. An extreme workaround would involve flattening the cloned image to avoid the CoW.

#8 Updated by Nathan Cutler about 8 years ago

  • Related to Bug #13726: QEMU hangs after creating snapshot and stopping VM added

#9 Updated by Jason Dillaman about 8 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Jason Dillaman

#10 Updated by Jason Dillaman about 8 years ago

  • Backport set to hammer, infernalis

Shouldn't be an issue in Jewel due to new asynchronous internals

#11 Updated by Nathan Cutler about 8 years ago

  • Status changed from In Progress to Pending Backport

Note that there is no master PR for this issue. Jason is going to do the fix in infernalis and then backport it from there to hammer.

#12 Updated by Nathan Cutler about 8 years ago

  • Copied to Backport #15032: infernalis: QEMU VM hangs talking to RBD via librbd added

#13 Updated by Nathan Cutler about 8 years ago

  • Copied to Backport #15033: hammer: QEMU VM hangs talking to RBD via librbd added

#14 Updated by Jason Dillaman about 8 years ago

master PR: https://github.com/ceph/ceph/pull/8009 (reproducer -- confirms not an issue in Jewel)

#15 Updated by Jason Dillaman almost 8 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF