Project

General

Profile

Bug #998

qemu/librbd race conditon

Added by Josh Durgin almost 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
librbd
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

As reported in http://marc.info/?l=kvm&m=130225223223016&w=2:

There's a race condition that sometimes causes aio_read to fail, probably due to a use-after-free error.

crash.log.gz (595 KB) ar Fred, 06/15/2011 12:50 PM

crash.log.gz (653 KB) ar Fred, 06/15/2011 01:06 PM

Associated revisions

Revision ea3db176 (diff)
Added by Josh Durgin almost 13 years ago

librbd: fix AioCompletion race condition

If block completions finished before all of them where scheduled, the
pending_count would reach 0, and the AioCompletion would be destroyed
since the refcount would also reach 0. To fix this, hold a reference
to the AioCompletion while scheduling more block completions.

This caused a crash like this when running qemu:

1: ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=<value optimized out>) at common/assert.cc:86
2: Lock (ictx=0x7f83d0001f00, off=5368705024, len=0, buf=0x2fac000 "", c=0x2cebe00) at common/Mutex.h:118
3: add_block_completion (ictx=0x7f83d0001f00, off=5368705024, len=0, buf=0x2fac000 "", c=0x2cebe00) at librbd.cc:159
4: librbd::aio_read (ictx=0x7f83d0001f00, off=5368705024, len=0, buf=0x2fac000 "", c=0x2cebe00) at librbd.cc:1368
5: rbd_aio_rw_vector (bs=<value optimized out>, sector_num=10485752, qiov=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>, write=0) at block/rbd.c:626
6: qemu_rbd_aio_readv (bs=<value optimized out>, sector_num=<value optimized out>, qiov=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>) at block/rbd.c:649
7: bdrv_aio_readv (bs=0x2ce92e0, sector_num=10485752, qiov=0x2de08f0, nb_sectors=8, cb=<value optimized out>, opaque=<value optimized out>)

Fixes: #998
Signed-off-by: Josh Durgin <>

History

#1 Updated by Josh Durgin almost 13 years ago

  • Status changed from New to Can't reproduce

Stefan had no core file, and neither he nor I could subsequently reproduce this.

#2 Updated by Josh Durgin almost 13 years ago

  • Status changed from Can't reproduce to In Progress
  • Target version changed from v0.27 to v0.29

Christian Brunner had a similar error, but in aio_write during a yum upgrade. We should do more testing on this.

#3 Updated by Sage Weil almost 13 years ago

  • translation missing: en.field_position set to 33

#4 Updated by Sage Weil almost 13 years ago

  • translation missing: en.field_story_points set to 3
  • translation missing: en.field_position deleted (33)
  • translation missing: en.field_position set to 33

#5 Updated by Josh Durgin almost 13 years ago

  • Subject changed from qemu-iotest 008 fails intermittently to qemu/librbd race conditon

#6 Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.29 to v0.30

#7 Updated by Sage Weil almost 13 years ago

  • translation missing: en.field_position deleted (43)
  • translation missing: en.field_position set to 8

#8 Updated by ar Fred almost 13 years ago

I'm having a lot of similar crashes these days, they usually happen during or soon after the startup of the VM.

Although random, this is quite easy to reproduce for me: shutdown all VMs on one host, restart libvirtd, which will automatically start all VMs configured to be autostarted (that is approx. 10 VMs on my test box), usually, 1 or 2 of them crash...

The first backtrace:

Core was generated by `/usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 4,sockets=4,cores=1,threads'.
Program terminated with signal 6, Aborted.
#0  0x00007f83de255d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f83de255d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f83de259ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f83ddfd76dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f83ddfd5926 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f83ddfd5953 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f83ddfd5a5e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f83e07f41d9 in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=<value optimized out>) at common/assert.cc:86
#7  0x00007f83e0ad5f0b in Lock (ictx=0x7f83d0001f00, off=5368705024, len=0, buf=0x2fac000 "", c=0x2cebe00) at common/Mutex.h:118
#8  add_block_completion (ictx=0x7f83d0001f00, off=5368705024, len=0, buf=0x2fac000 "", c=0x2cebe00) at librbd.cc:159
#9  librbd::aio_read (ictx=0x7f83d0001f00, off=5368705024, len=0, buf=0x2fac000 "", c=0x2cebe00) at librbd.cc:1368
#10 0x00000000004628b4 in rbd_aio_rw_vector (bs=<value optimized out>, sector_num=10485752, qiov=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>, write=0)
    at block/rbd.c:626
#11 0x00000000004629b0 in qemu_rbd_aio_readv (bs=<value optimized out>, sector_num=<value optimized out>, qiov=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>)
    at block/rbd.c:649
#12 0x000000000043db4c in bdrv_aio_readv (bs=0x2ce92e0, sector_num=10485752, qiov=0x2de08f0, nb_sectors=8, cb=<value optimized out>, opaque=<value optimized out>) at block.c:2063
#13 0x000000000043db4c in bdrv_aio_readv (bs=0x2ce8990, sector_num=10485752, qiov=0x2de08f0, nb_sectors=8, cb=<value optimized out>, opaque=<value optimized out>) at block.c:2063
#14 0x00000000004221c0 in virtio_blk_handle_read (req=0x2dd48c0, mrb=<value optimized out>) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/hw/virtio-blk.c:329
#15 virtio_blk_handle_request (req=0x2dd48c0, mrb=<value optimized out>) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/hw/virtio-blk.c:375
#16 0x000000000042257b in virtio_blk_handle_output (vdev=0x7f83d089f870, vq=<value optimized out>) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/hw/virtio-blk.c:388
#17 0x0000000000431f2d in kvm_handle_io (env=0x7f83d0030260) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/kvm-all.c:821
#18 kvm_run (env=0x7f83d0030260) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/qemu-kvm.c:617
#19 0x0000000000433369 in kvm_cpu_exec (env=<value optimized out>) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/qemu-kvm.c:1233
#20 0x00000000004335d1 in kvm_main_loop_cpu (_env=0x7f83d0030260) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/qemu-kvm.c:1419
#21 ap_main_loop (_env=0x7f83d0030260) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/qemu-kvm.c:1466
#22 0x00007f83e0eecd8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#23 0x00007f83de30804d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#24 0x0000000000000000 in ?? ()

The second backtrace is from another VM, these are the same line numbers but I'm posting it just in case...

Core was generated by `/usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads='.
Program terminated with signal 6, Aborted.
#0  0x00007f62b73f3d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f62b73f3d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f62b73f7ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f62b71756dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f62b7173926 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f62b7173953 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f62b7173a5e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f62b99921d9 in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=<value optimized out>) at common/assert.cc:86
#7  0x00007f62b9c73f0b in Lock (ictx=0x7f62a80008b0, off=142602240, len=0, 
    buf=0x1f1f000 "# tty1 - getty\n#\n# This service maintains a getty on tty1 from the point the system is\n# started until it is shut down again.\n\nstart on stopped rc RUNLEVEL=[2345]\nstop on runlevel [!2345]\n\nrespawn\nexe"..., 
    c=0x1eed260) at common/Mutex.h:118
#8  add_block_completion (ictx=0x7f62a80008b0, off=142602240, len=0, 
    buf=0x1f1f000 "# tty1 - getty\n#\n# This service maintains a getty on tty1 from the point the system is\n# started until it is shut down again.\n\nstart on stopped rc RUNLEVEL=[2345]\nstop on runlevel [!2345]\n\nrespawn\nexe"..., 
    c=0x1eed260) at librbd.cc:159
#9  librbd::aio_read (ictx=0x7f62a80008b0, off=142602240, len=0, 
    buf=0x1f1f000 "# tty1 - getty\n#\n# This service maintains a getty on tty1 from the point the system is\n# started until it is shut down again.\n\nstart on stopped rc RUNLEVEL=[2345]\nstop on runlevel [!2345]\n\nrespawn\nexe"..., 
    c=0x1eed260) at librbd.cc:1368
#10 0x00000000004628b4 in rbd_aio_rw_vector (bs=<value optimized out>, sector_num=278520, qiov=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>, write=0) at block/rbd.c:626
#11 0x00000000004629b0 in qemu_rbd_aio_readv (bs=<value optimized out>, sector_num=<value optimized out>, qiov=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>)
    at block/rbd.c:649
#12 0x000000000043db4c in bdrv_aio_readv (bs=0x1ebd2e0, sector_num=278520, qiov=0x1f1efb0, nb_sectors=8, cb=<value optimized out>, opaque=<value optimized out>) at block.c:2063
#13 0x000000000043db4c in bdrv_aio_readv (bs=0x1ebc990, sector_num=278520, qiov=0x1f1efb0, nb_sectors=8, cb=<value optimized out>, opaque=<value optimized out>) at block.c:2063
#14 0x00000000004221c0 in virtio_blk_handle_read (req=0x1f12f80, mrb=<value optimized out>) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/hw/virtio-blk.c:329
#15 virtio_blk_handle_request (req=0x1f12f80, mrb=<value optimized out>) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/hw/virtio-blk.c:375
#16 0x000000000042257b in virtio_blk_handle_output (vdev=0x1ebf4e0, vq=<value optimized out>) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/hw/virtio-blk.c:388
#17 0x0000000000431f2d in kvm_handle_io (env=0x7f62a8004b20) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/kvm-all.c:821
#18 kvm_run (env=0x7f62a8004b20) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/qemu-kvm.c:617
#19 0x0000000000433369 in kvm_cpu_exec (env=<value optimized out>) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/qemu-kvm.c:1233
#20 0x00000000004335d1 in kvm_main_loop_cpu (_env=0x7f62a8004b20) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/qemu-kvm.c:1419
#21 ap_main_loop (_env=0x7f62a8004b20) at /tmp/buildd/qemu-kvm-0.14.0+noroms-0ubuntu4.1/qemu-kvm.c:1466
#22 0x00007f62ba08ad8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#23 0x00007f62b74a604d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#24 0x0000000000000000 in ?? ()

I'm keeping these core files around in case you need more details.

This is a custom build of ceph at 6f6ac75d769b1a915abe754394d3b50a5b594bcc
This is qemu-kvm-0.14.1 to which I added the following patches from ceph's repository:

Author: Josh Durgin <josh.durgin@dreamhost.com>
Date:   Tue Apr 12 15:58:52 2011 -0700

    rbd: Add bdrv_truncate implementation

    Reviewed-by: Christian Brunner <chb@muc.de>
    Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>

Author: Josh Durgin <josh.durgin@dreamhost.com>
Date:   Mon Apr 11 16:24:11 2011 -0700

    rbd: check return values when scheduling aio

    If scheduling fails, the number of outstanding I/Os must be correct,
    or there will be a hang when waiting for everything to be flushed.

    Reviewed-by: Christian Brunner <chb@muc.de>
    Reported-by: Stefan Hajnoczi <stefanha@gmail.com>
    Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>

Author: Josh Durgin <josh.durgin@dreamhost.com>
Date:   Thu Mar 24 13:57:19 2011 -0700

    rbd: allow configuration of rados from the rbd filename

    The new format is rbd:pool/image[@snapshot][:option1=value1[:option2=value2...]]
    Each option is used to configure rados, and may be any Ceph option, or "conf".
    The "conf" option specifies a Ceph configuration file to read.

    This allows rbd volumes from more than one Ceph cluster to be used by
    specifying different monitor addresses, as well as having different
    logging levels or locations for different volumes.

    Reviewed-by: Christian Brunner <chb@muc.de>
    Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>

Author: Josh Durgin <josh.durgin@dreamhost.com>
Date:   Thu Mar 24 13:56:43 2011 -0700

    rbd: use the higher level librbd instead of just librados

    librbd stacks on top of librados to provide access
    to rbd images.

    Using librbd simplifies the qemu code, and allows
    qemu to use new versions of the rbd format
    with few (if any) changes.

    Reviewed-by: Christian Brunner <chb@muc.de>
    Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
    Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>

#9 Updated by Josh Durgin almost 13 years ago

I pushed a simpler (and I think correct) fix to the wip-librbd-race branch. Does this fix the problem for you, ar Fred?

#10 Updated by ar Fred almost 13 years ago

Thank you for the new patch.

I still managed to crash 3 of my VMs, each with a different stacktrace:

Core was generated by `/usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads='.
Program terminated with signal 6, Aborted.
#0  0x00007fa186ec1d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007fa186ec1d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fa186ec5ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fa186c436dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fa186c41926 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fa186c41953 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fa186c41a5e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fa1894601d9 in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=<value optimized out>)
    at common/assert.cc:86
#7  0x00007fa18973e5cd in Lock (this=<value optimized out>) at common/Mutex.h:118
#8  get_return_value (this=<value optimized out>) at librbd.cc:173
#9  librbd::RBD::AioCompletion::get_return_value (this=<value optimized out>) at librbd.cc:1477
#10 0x00000000004629fd in rbd_finish_aiocb (c=0x7fa178002d20, rcb=0x7fa1782f3d90) at block/rbd.c:546
#11 0x00007fa189745eb6 in librbd::AioCompletion::complete_block (this=0x7fa1782f3c90, block_completion=<value optimized out>, r=<value optimized out>) at librbd.cc:1262
#12 0x00007fa18974692d in librbd::rados_aio_sparse_read_cb (c=0x7fa1782a4890, arg=0x7fa17829ddd0) at librbd.cc:1335
#13 0x00007fa1893c6eee in librados::RadosClient::C_aio_sparse_read_Ack::finish (this=0x7fa17829dc20, r=<value optimized out>) at librados.cc:479
#14 0x00007fa1893d2863 in Objecter::handle_osd_op_reply (this=0x1bc2fd0, m=0x1c00400) at osdc/Objecter.cc:796
#15 0x00007fa1893bae8c in librados::RadosClient::_dispatch (this=0x1bbff20, m=<value optimized out>) at librados.cc:786
#16 0x00007fa1893baf53 in librados::RadosClient::ms_dispatch (this=0x1bbff20, m=0x1c00400) at librados.cc:752
#17 0x00007fa18947a5bd in ms_deliver_dispatch (this=0x1bc6280) at msg/Messenger.h:98
#18 SimpleMessenger::dispatch_entry (this=0x1bc6280) at msg/SimpleMessenger.cc:353
#19 0x00007fa1893bb52c in SimpleMessenger::DispatchThread::entry (this=0x1bc6708) at msg/SimpleMessenger.h:546
#20 0x00007fa189b58d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#21 0x00007fa186f7404d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#22 0x0000000000000000 in ?? ()
Core was generated by `/usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 1,sockets=1,cores=1,threads'.
Program terminated with signal 11, Segmentation fault.
#0  Lock (this=0x7f54c40008d0) at common/Mutex.h:115
115     common/Mutex.h: No such file or directory.
        in common/Mutex.h
(gdb) bt
#0  Lock (this=0x7f54c40008d0) at common/Mutex.h:115
#1  get_return_value (this=0x7f54c40008d0) at librbd.cc:173
#2  librbd::RBD::AioCompletion::get_return_value (this=0x7f54c40008d0) at librbd.cc:1477
#3  0x00000000004629fd in rbd_finish_aiocb (c=0x7f54c40008d0, rcb=0x7f54c4002c10) at block/rbd.c:546
#4  0x00007f5555dcdeb6 in librbd::AioCompletion::complete_block (this=0x7f54c40008f0, block_completion=<value optimized out>, r=<value optimized out>) at librbd.cc:1262
#5  0x00007f5555dce92d in librbd::rados_aio_sparse_read_cb (c=0x7f54c4001a10, arg=0x7f54c4002d00) at librbd.cc:1335
#6  0x00007f5555a4eeee in librados::RadosClient::C_aio_sparse_read_Ack::finish (this=0x7f54c4002c50, r=<value optimized out>) at librados.cc:479
#7  0x00007f5555a5a863 in Objecter::handle_osd_op_reply (this=0x2eaffd0, m=0x37732c0) at osdc/Objecter.cc:796
#8  0x00007f5555a42e8c in librados::RadosClient::_dispatch (this=0x2eacf20, m=<value optimized out>) at librados.cc:786
#9  0x00007f5555a42f53 in librados::RadosClient::ms_dispatch (this=0x2eacf20, m=0x37732c0) at librados.cc:752
#10 0x00007f5555b025bd in ms_deliver_dispatch (this=0x2eb3280) at msg/Messenger.h:98
#11 SimpleMessenger::dispatch_entry (this=0x2eb3280) at msg/SimpleMessenger.cc:353
#12 0x00007f5555a4352c in SimpleMessenger::DispatchThread::entry (this=0x2eb3708) at msg/SimpleMessenger.h:546
#13 0x00007f55561e0d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#14 0x00007f55535fc04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x0000000000000000 in ?? ()

that one is quite similar to the first one, but crashed at common/Mutex.h:116 instead of common/Mutex.h:118

Core was generated by `/usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 4,sockets=4,cores=1,threads'.
Program terminated with signal 6, Aborted.
#0  0x00007f3bc6eabd05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f3bc6eabd05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f3bc6eafab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f3bc6ea47c5 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f3bc9b45338 in __pthread_mutex_lock_full () from /lib/x86_64-linux-gnu/libpthread.so.0
#4  0x00007f3bc97284d9 in Lock (this=<value optimized out>) at common/Mutex.h:116
#5  get_return_value (this=<value optimized out>) at librbd.cc:173
#6  librbd::RBD::AioCompletion::get_return_value (this=<value optimized out>) at librbd.cc:1477
#7  0x00000000004629fd in rbd_finish_aiocb (c=0x1a35350, rcb=0x1a12600) at block/rbd.c:546
#8  0x00007f3bc972feb6 in librbd::AioCompletion::complete_block (this=0x2302690, block_completion=<value optimized out>, r=<value optimized out>) at librbd.cc:1262
#9  0x00007f3bc973092d in librbd::rados_aio_sparse_read_cb (c=0x23248f0, arg=0x2311db0) at librbd.cc:1335
#10 0x00007f3bc93b0eee in librados::RadosClient::C_aio_sparse_read_Ack::finish (this=0x242fb10, r=<value optimized out>) at librados.cc:479
#11 0x00007f3bc93bc863 in Objecter::handle_osd_op_reply (this=0x1a15fd0, m=0x2307030) at osdc/Objecter.cc:796
#12 0x00007f3bc93a4e8c in librados::RadosClient::_dispatch (this=0x1a12f20, m=<value optimized out>) at librados.cc:786
#13 0x00007f3bc93a4f53 in librados::RadosClient::ms_dispatch (this=0x1a12f20, m=0x2307030) at librados.cc:752
#14 0x00007f3bc94645bd in ms_deliver_dispatch (this=0x1a19280) at msg/Messenger.h:98
#15 SimpleMessenger::dispatch_entry (this=0x1a19280) at msg/SimpleMessenger.cc:353
#16 0x00007f3bc93a552c in SimpleMessenger::DispatchThread::entry (this=0x1a19708) at msg/SimpleMessenger.h:546
#17 0x00007f3bc9b42d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007f3bc6f5e04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#19 0x0000000000000000 in ?? ()

I've got the core files if you need more information.

#11 Updated by ar Fred almost 13 years ago

a crash I could reproduce with 2510a0a31e3cc742f7d7564887fd93da5203967f

Core was generated by `/usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 4,sockets=4,cores=1,threads'.
Program terminated with signal 6, Aborted.
#0  0x00007f4653ebbd05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f4653ebbd05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f4653ebfab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f4653ef4d7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f4653efebb6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007f4653f02931 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f4653f02b6e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f4653f04eab in memalign () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f4653f0622e in posix_memalign () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x00007f465649ddaa in raw_posix_aligned (len=20480) at common/buffer.cc:132
#9  ceph::buffer::create_page_aligned (len=20480) at common/buffer.cc:238
#10 0x00007f4656475c3f in alloc_aligned_buffer (this=0x1cecbf0, pm=0x7f464bb10d68) at msg/SimpleMessenger.cc:1794
#11 SimpleMessenger::Pipe::read_message (this=0x1cecbf0, pm=0x7f464bb10d68) at msg/SimpleMessenger.cc:1924
#12 0x00007f465647c4a1 in SimpleMessenger::Pipe::reader (this=0x1cecbf0) at msg/SimpleMessenger.cc:1575
#13 0x00007f46563b557d in SimpleMessenger::Pipe::Reader::entry (this=<value optimized out>) at msg/SimpleMessenger.h:205
#14 0x00007f4656b52d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007f4653f6e04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x0000000000000000 in ?? ()

attaching the debug log

#12 Updated by ar Fred almost 13 years ago

also at 2510a0a31e3cc742f7d7564887fd93da5203967f

Core was generated by `/usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 4,sockets=4,cores=1,threads'.
Program terminated with signal 6, Aborted.
#0  0x00007f5f3e006d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f5f3e006d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f5f3e00aab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f5f3dd886dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f5f3dd86926 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f5f3dd86953 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f5f3dd86a5e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f5f405a51d9 in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=<value optimized out>)
    at common/assert.cc:86
#7  0x00007f5f408835bc in Lock (this=<value optimized out>) at common/Mutex.h:118
#8  get_return_value (this=<value optimized out>) at librbd.cc:175
#9  librbd::RBD::AioCompletion::get_return_value (this=<value optimized out>) at librbd.cc:1482
#10 0x00000000004629fd in rbd_finish_aiocb (c=0x22121e0, rcb=0x21b6410) at block/rbd.c:546
#11 0x00007f5f4088ab5c in librbd::AioCompletion::complete_block (this=0x21b6c10, block_completion=<value optimized out>, r=<value optimized out>) at librbd.cc:1267
#12 0x00007f5f4088b40d in librbd::rados_aio_sparse_read_cb (c=0x21b6d20, arg=0x21b7150) at librbd.cc:1340
#13 0x00007f5f4050beee in librados::RadosClient::C_aio_sparse_read_Ack::finish (this=0x236a620, r=<value optimized out>) at librados.cc:479
#14 0x00007f5f40517863 in Objecter::handle_osd_op_reply (this=0x2195100, m=0x7f5f308ccee0) at osdc/Objecter.cc:796
#15 0x00007f5f404ffe8c in librados::RadosClient::_dispatch (this=0x2191f20, m=<value optimized out>) at librados.cc:786
#16 0x00007f5f404fff53 in librados::RadosClient::ms_dispatch (this=0x2191f20, m=0x7f5f308ccee0) at librados.cc:752
#17 0x00007f5f405bf5bd in ms_deliver_dispatch (this=0x2198280) at msg/Messenger.h:98
#18 SimpleMessenger::dispatch_entry (this=0x2198280) at msg/SimpleMessenger.cc:353
#19 0x00007f5f4050052c in SimpleMessenger::DispatchThread::entry (this=0x2198708) at msg/SimpleMessenger.h:546
#20 0x00007f5f40c9dd8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#21 0x00007f5f3e0b904d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#22 0x0000000000000000 in ?? ()

#13 Updated by Josh Durgin almost 13 years ago

I updated the wip-librbd-race branch again - I managed to confuse myself into thinking the block completion patch was unnecessary yesterday, but now I realize that it was still a problem (the callback could be called multiple times).

#14 Updated by ar Fred almost 13 years ago

Can't get kvm to crash anymore during my latest tests. Two days before, with the same amount of testing, I would have got approximately 10-20 kvm crashes...

I think we can close this one, thanks a lot!

#15 Updated by Josh Durgin almost 13 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF