Project

General

Profile

Actions

Bug #5955

closed

qemu deadlock when librbd caching enabled (writethru or writeback).

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From Mike Dawson on ML:

Logs are uploaded to cephdrop with the file name mikedawson-rbd-qemu-deadlock.

- At about 2013-08-05 19:46 or 47, we hit the issue, traffic went to 0
- At about 2013-08-05 19:53:51, ran a 'virsh screenshot'

Environment is:

- Ceph 0.61.7 (client is co-mingled with three OSDs)
- rbd cache = true and cache=writeback
- qemu 1.4.0 1.4.0+dfsg-1expubuntu4
- Ubuntu Raring with 3.8.0-25-generic

This issue is reproducible in my environment, and I'm willing to run any wip
branch you need. What else can I provide to help?


We've had a similar situation occur. For about three months, we've run several
Windows 2008 R2 guests with virtio drivers that record video surveillance. We
have long suffered an issue where the guest appears to hang indefinitely (or
until we intervene). For the sake of this conversation, we call this state
"wedged", because it appears something (rbd, qemu, virtio, etc) gets stuck on a
deadlock. When a guest gets wedged, we see the following:

- the guest will not respond to pings
- the qemu-system-x86_64 process drops to 0% cpu
- graphite graphs show the interface traffic dropping to 0bps
- the guest will stay wedged forever (or until we intervene)
- strace of qemu-system-x86_64 shows QEMU is making progress [1]2

We can "un-wedge" the guest by opening a NoVNC session or running a 'virsh
screenshot' command. After that, the guest resumes and runs as expected. At that
point we can examine the guest. Each time we'll see:

- No Windows error logs whatsoever while the guest is wedged
- A time sync typically occurs right after the guest gets un-wedged
- Scheduled tasks do not run while wedged
- Windows error logs do not show any evidence of suspend, sleep, etc

We had so many issue with guests becoming wedged, we wrote a script to 'virsh
screenshot' them via cron. Then we installed some updates and had a month or so
of higher stability (wedging happened maybe 1/10th as often). Until today we
couldn't figure out why.

Yesterday, I realized qemu was starting the instances without specifying
cache=writeback. We corrected that, and let them run overnight. With RBD
writeback re-enabled, wedging came back as often as we had seen in the past.
I've counted ~40 occurrences in the past 12-hour period. So I feel like
writeback caching in RBD certainly makes the deadlock more likely to occur.

Joshd asked us to gather RBD client logs:

"joshd> it could very well be the writeback cache not doing a callback at some
point - if you could gather logs of a vm getting stuck with debug rbd = 20,
debug ms = 1, and debug objectcacher = 30 that would be great"

We'll do that over the weekend. If you could as well, we'd love the help!

[1] http://www.gammacode.com/kvm/wedged-with-timestamps.txt
[2] http://www.gammacode.com/kvm/not-wedged.txt

Actions #1

Updated by Sage Weil over 10 years ago

normally completions follow this pattern

2013-08-05 19:46:27.313676 7f6877699940 20 librbd: aio_write 0x7f68798c2e80 off = 968329949184 len = 57344 buf = 0x7f6545392bb8
2013-08-05 19:46:27.313684 7f6877699940 20 librbd: ictx_check 0x7f68798c2e80
2013-08-05 19:46:27.313686 7f6877699940 20 librbd:   parent overlap 0
2013-08-05 19:46:27.313695 7f6877699940 20 librbd:  oid rbd_data.1bfef52ae8944a.00000000000385d3 3567616~57344 from [0,57344]
2013-08-05 19:46:27.313763 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) map_write oex rbd_data.1bfef52ae8944a.00000000000385d3 3567616~57344
2013-08-05 19:46:27.313770 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) cur is 3567616, p is bh[ 0x7f6879936fe0 3559424~8704 0x7f687af0cd60 (8704) v 4082 clean firstbyte=99] waiters = {}
2013-08-05 19:46:27.313811 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) map_write bh bh[ 0x7f6879936fe0 3559424~8704 0x7f687af0cd60 (8704) v 4082 clean firstbyte=99] waiters = {} intersected
2013-08-05 19:46:27.313816 7f6877699940 20 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) split bh[ 0x7f6879936fe0 3559424~8704 0x7f687af0cd60 (8704) v 4082 clean firstbyte=99] waiters = {} at 3567616
2013-08-05 19:46:27.313821 7f6877699940 30 objectcacher bh_add object[rbd_data.1bfef52ae8944a.00000000000385d3/head oset 0x7f68798b2cf0 wr 4089/4089] bh[ 0x7f6879a3ad30 3567616~512 0x7f687af0cd60 (0) v 4082 clean] waiters = {}
2013-08-05 19:46:27.313830 7f6877699940 20 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) split    left is bh[ 0x7f6879936fe0 3559424~8192 0x7f687af0cd60 (8192) v 4082 clean firstbyte=99] waiters = {}
2013-08-05 19:46:27.313834 7f6877699940 20 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) split   right is bh[ 0x7f6879a3ad30 3567616~512 0x7f687af0cd60 (512) v 4082 clean firstbyte=1] waiters = {}
2013-08-05 19:46:27.313838 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) map_write final is bh[ 0x7f6879a3ad30 3567616~57344 0x7f687af0cd60 (512) v 4082 clean firstbyte=1] waiters = {}
2013-08-05 19:46:27.313842 7f6877699940 10 objectcacher writex writing 0~57344 into bh[ 0x7f6879a3ad30 3567616~57344 0x7f687af0cd60 (512) v 4082 clean firstbyte=1] waiters = {} at 3567616
2013-08-05 19:46:27.313848 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) try_merge_bh bh[ 0x7f6879a3ad30 3567616~57344 0x7f687af0cd60 (57344) v 4082 dirty firstbyte=1] waiters = {}
2013-08-05 19:46:27.313859 7f6877699940 10 objectcacher trim  start: bytes: max 33554432  clean 33533440, objects: max 42 current 42
2013-08-05 19:46:27.313862 7f6877699940 10 objectcacher trim finish:  max 33554432  clean 33533440, objects: max 42 current 42
2013-08-05 19:46:27.313865 7f6877699940 20 librbd::AioCompletion: AioCompletion::finish_adding_requests 0x7f687c3354d0 pending 1
2013-08-05 19:46:27.313912 7f6877699940 20 librbd: aio_write 0x7f68798c2e80 off = 968330530816 len = 56832 buf = 0x7f654baf9bb8
2013-08-05 19:46:27.313917 7f6877699940 20 librbd: ictx_check 0x7f68798c2e80
2013-08-05 19:46:27.313919 7f6877699940 20 librbd:   parent overlap 0
2013-08-05 19:46:27.313931 7f6877699940 20 librbd:  oid rbd_data.1bfef52ae8944a.00000000000385d3 4149248~45056 from [0,45056]
2013-08-05 19:46:27.313937 7f6820ff9700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f687c3354d0 complete_cb=0x7f6877788400 pending 1
2013-08-05 19:46:27.313948 7f6820ff9700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f687c3354d0 rval 0 read_buf 0 read_bl 0

note that for each aio, we finish_adding_requests in the calling thread, and then see the finisher thread do complete_request and finalize().

however, for one request just prior to the hang i see

2013-08-05 19:46:27.287412 7f6877699940 20 librbd: aio_write 0x7f68798a3290 off = 3190296576 len = 4096 buf = 0x7f654baf9bb8
2013-08-05 19:46:27.287425 7f6877699940 20 librbd: ictx_check 0x7f68798a3290
2013-08-05 19:46:27.287429 7f6877699940 20 librbd:   parent overlap 85899345920
2013-08-05 19:46:27.287439 7f6877699940 20 librbd:  oid rbd_data.1a4d7d2eb141f2.00000000000002f8 2625536~4096 from [0,4096]
2013-08-05 19:46:27.287450 7f6877699940 10 objectcacher.object(rbd_data.1a4d7d2eb141f2.00000000000002f8/head) map_write oex rbd_data.1a4d7d2eb141f2.00000000000002f8 2625536~4096
2013-08-05 19:46:27.287454 7f6877699940 10 objectcacher.object(rbd_data.1a4d7d2eb141f2.00000000000002f8/head) cur is 2625536, p is bh[ 0x7f6879a655a0 2617344~12288 0x7f687b48ec80 (12288) v 3975 dirty firstbyte=82] waiters = {}
2013-08-05 19:46:27.287459 7f6877699940 10 objectcacher.object(rbd_data.1a4d7d2eb141f2.00000000000002f8/head) map_write bh bh[ 0x7f6879a655a0 2617344~12288 0x7f687b48ec80 (12288) v 3975 dirty firstbyte=82] waiters = {} intersected
2013-08-05 19:46:27.287463 7f6877699940 20 objectcacher.object(rbd_data.1a4d7d2eb141f2.00000000000002f8/head) split bh[ 0x7f6879a655a0 2617344~12288 0x7f687b48ec80 (12288) v 3975 dirty firstbyte=82] waiters = {} at 2625536
2013-08-05 19:46:27.287468 7f6877699940 30 objectcacher bh_add object[rbd_data.1a4d7d2eb141f2.00000000000002f8/head oset 0x7f68798a2920 wr 3975/3975] bh[ 0x7f6879a33000 2625536~4096 0x7f687b48ec80 (0) v 3975 dirty] waiters = {}
2013-08-05 19:46:27.287475 7f6877699940 20 objectcacher.object(rbd_data.1a4d7d2eb141f2.00000000000002f8/head) split    left is bh[ 0x7f6879a655a0 2617344~8192 0x7f687b48ec80 (8192) v 3975 dirty firstbyte=82] waiters = {}
2013-08-05 19:46:27.287478 7f6877699940 20 objectcacher.object(rbd_data.1a4d7d2eb141f2.00000000000002f8/head) split   right is bh[ 0x7f6879a33000 2625536~4096 0x7f687b48ec80 (4096) v 3975 dirty firstbyte=82] waiters = {}
2013-08-05 19:46:27.287482 7f6877699940 10 objectcacher.object(rbd_data.1a4d7d2eb141f2.00000000000002f8/head) map_write final is bh[ 0x7f6879a33000 2625536~4096 0x7f687b48ec80 (4096) v 3975 dirty firstbyte=82] waiters = {}
2013-08-05 19:46:27.287486 7f6877699940 10 objectcacher writex writing 0~4096 into bh[ 0x7f6879a33000 2625536~4096 0x7f687b48ec80 (4096) v 3975 dirty firstbyte=82] waiters = {} at 2625536
2013-08-05 19:46:27.287491 7f6877699940 10 objectcacher.object(rbd_data.1a4d7d2eb141f2.00000000000002f8/head) try_merge_bh bh[ 0x7f6879a33000 2625536~4096 0x7f687b48ec80 (4096) v 3975 dirty firstbyte=82] waiters = {}
2013-08-05 19:46:27.287495 7f6877699940 10 objectcacher.object(rbd_data.1a4d7d2eb141f2.00000000000002f8/head) merge_left bh[ 0x7f6879a655a0 2617344~8192 0x7f687b48ec80 (8192) v 3975 dirty firstbyte=82] waiters = {} + bh[ 0x7f6879a33000 2625536~4096 0x7f687b48ec80 (4096) v 3975 dirty firstbyte=82] waiters = {}
2013-08-05 19:46:27.287502 7f6877699940 30 objectcacher bh_remove object[rbd_data.1a4d7d2eb141f2.00000000000002f8/head oset 0x7f68798a2920 wr 3975/3975] bh[ 0x7f6879a33000 2625536~4096 0x7f687b48ec80 (4096) v 3975 dirty firstbyte=82] waiters = {}
2013-08-05 19:46:27.287508 7f6877699940 10 objectcacher.object(rbd_data.1a4d7d2eb141f2.00000000000002f8/head) merge_left result bh[ 0x7f6879a655a0 2617344~12288 0x7f687b48ec80 (12288) v 3975 dirty firstbyte=82] waiters = {}
2013-08-05 19:46:27.287519 7f6877699940 10 objectcacher trim  start: bytes: max 33554432  clean 33363968, objects: max 42 current 244
2013-08-05 19:46:27.287522 7f6877699940 10 objectcacher trim finish:  max 33554432  clean 33363968, objects: max 42 current 244
2013-08-05 19:46:27.287525 7f6877699940 20 librbd::AioCompletion: AioCompletion::finish_adding_requests 0x7f687a96aec0 pending 1
2013-08-05 19:46:27.287604 7f6864c07700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f687a96aec0 complete_cb=0x7f6877788400 pending 1
2013-08-05 19:46:27.287613 7f6864c07700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f687a96aec0 rval 0 read_buf 0 read_bl 0
...some other requests that are normal, and then....
2013-08-05 19:46:27.313912 7f6877699940 20 librbd: aio_write 0x7f68798c2e80 off = 968330530816 len = 56832 buf = 0x7f654baf9bb8
2013-08-05 19:46:27.313917 7f6877699940 20 librbd: ictx_check 0x7f68798c2e80
2013-08-05 19:46:27.313919 7f6877699940 20 librbd:   parent overlap 0
2013-08-05 19:46:27.313931 7f6877699940 20 librbd:  oid rbd_data.1bfef52ae8944a.00000000000385d3 4149248~45056 from [0,45056]
2013-08-05 19:46:27.313937 7f6820ff9700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f687c3354d0 complete_cb=0x7f6877788400 pending 1
2013-08-05 19:46:27.313948 7f6820ff9700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f687c3354d0 rval 0 read_buf 0 read_bl 0
2013-08-05 19:46:27.313962 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) map_write oex rbd_data.1bfef52ae8944a.00000000000385d3 4149248~45056
2013-08-05 19:46:27.313967 7f6877699940 30 objectcacher bh_add object[rbd_data.1bfef52ae8944a.00000000000385d3/head oset 0x7f68798b2cf0 wr 4089/4089] bh[ 0x7f6879a23340 4149248~45056 0x7f687af0cd60 (0) v 0 missing] waiters = {}
2013-08-05 19:46:27.313973 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) map_write adding trailing bh bh[ 0x7f6879a23340 4149248~45056 0x7f687af0cd60 (0) v 0 missing] waiters = {}
2013-08-05 19:46:27.313977 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) map_write final is bh[ 0x7f6879a23340 4149248~45056 0x7f687af0cd60 (0) v 0 missing] waiters = {}
2013-08-05 19:46:27.313980 7f6877699940 10 objectcacher writex writing 0~45056 into bh[ 0x7f6879a23340 4149248~45056 0x7f687af0cd60 (0) v 0 missing] waiters = {} at 4149248
2013-08-05 19:46:27.313986 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d3/head) try_merge_bh bh[ 0x7f6879a23340 4149248~45056 0x7f687af0cd60 (45056) v 0 dirty firstbyte=29] waiters = {}
2013-08-05 19:46:27.313996 7f6877699940 10 objectcacher trim  start: bytes: max 33554432  clean 33533440, objects: max 42 current 42
2013-08-05 19:46:27.313999 7f6877699940 10 objectcacher trim finish:  max 33554432  clean 33533440, objects: max 42 current 42
2013-08-05 19:46:27.314002 7f6877699940 20 librbd:  oid rbd_data.1bfef52ae8944a.00000000000385d4 0~11776 from [45056,11776]
....wtf is this....
2013-08-05 19:46:27.314009 7f6820ff9700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f687a96aec0 complete_cb=0x7f6877788400 pending 1
2013-08-05 19:46:27.314055 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d4/head) map_write oex rbd_data.1bfef52ae8944a.00000000000385d4 0~11776
2013-08-05 19:46:27.314059 7f6877699940 30 objectcacher bh_add object[rbd_data.1bfef52ae8944a.00000000000385d4/head oset 0x7f68798b2cf0 wr 0/0] bh[ 0x7f687afaa7c0 0~11776 0x7f687b409040 (0) v 0 missing] waiters = {}
2013-08-05 19:46:27.314082 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d4/head) map_write adding trailing bh bh[ 0x7f687afaa7c0 0~11776 0x7f687b409040 (0) v 0 missing] waiters = {}
2013-08-05 19:46:27.314086 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d4/head) map_write final is bh[ 0x7f687afaa7c0 0~11776 0x7f687b409040 (0) v 0 missing] waiters = {}
2013-08-05 19:46:27.314089 7f6877699940 10 objectcacher writex writing 0~11776 into bh[ 0x7f687afaa7c0 0~11776 0x7f687b409040 (0) v 0 missing] waiters = {} at 0
2013-08-05 19:46:27.314095 7f6877699940 10 objectcacher.object(rbd_data.1bfef52ae8944a.00000000000385d4/head) try_merge_bh bh[ 0x7f687afaa7c0 0~11776 0x7f687b409040 (11776) v 0 dirty firstbyte=5] waiters = {}
2013-08-05 19:46:27.314131 7f6877699940 10 objectcacher trim  start: bytes: max 33554432  clean 33533440, objects: max 42 current 43
2013-08-05 19:46:27.314135 7f6877699940 10 objectcacher trim trimming object[rbd_data.1bfef52ae8944a.0000000000038340/head oset 0x7f68798b2cf0 wr 185/185]
2013-08-05 19:46:27.314138 7f6877699940 10 objectcacher close_object object[rbd_data.1bfef52ae8944a.0000000000038340/head oset 0x7f68798b2cf0 wr 185/185]
2013-08-05 19:46:27.314142 7f6877699940 10 objectcacher trim finish:  max 33554432  clean 33533440, objects: max 42 current 42
2013-08-05 19:46:27.314145 7f6877699940 20 librbd::AioCompletion: AioCompletion::finish_adding_requests 0x7f687a96aec0 pending 1
2013-08-05 19:46:27.314204 7f6820ff9700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f687a96aec0 complete_cb=0x7f6877788400 pending 1
2013-08-05 19:46:27.314214 7f6820ff9700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f687a96aec0 rval 0 read_buf 0 read_bl 0

want to reproduce with logging on in the finisher

Actions #2

Updated by Sage Weil over 10 years ago

  • Status changed from 12 to Need More Info
Actions #3

Updated by Mike Dawson over 10 years ago

This hang occurred frequently with qemu 1.4.0, but after a week of trying, I cannot reproduce this bug under qemu 1.5.2. I believe the major difference is joshd's async flush patch for qemu.

Actions #4

Updated by Sage Weil over 10 years ago

  • Priority changed from Urgent to High
Actions #5

Updated by Sage Weil over 10 years ago

  • Status changed from Need More Info to In Progress
Actions #6

Updated by Mike Dawson over 10 years ago

Sage,

This issue should be closed, as it seems to be resolved by upgrading qemu to include joshd's async flush patch which landed in 1.4.2. With the patch, we have not seen this issue.

- Mike

Actions #7

Updated by Sage Weil over 10 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF