Project

General

Profile

Bug #5919

qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process

Added by Oliver Francke over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

we had a number of tickets raising, where users reported problems with latest debian-7.[01] and kernel 3.2.x/ Ubuntu 12 LTS and 3.2.0-51-amd in their VM's.
Problem currently observed on qemu with 1.4.0 and onwards incl. latest qemu-1.6.0-rc2.
No problem with upgraded kernel 3.8. for example.
No problem with qemu-1.2.2.
No problem with qcow2.
Problem there with rbd_cache=false/true, aio=native/none, cache=writeback/none.

Some brave assumption: s/t broke with RBD-cache-aio/async-patch triggered by broken client kernel 3.2 handling virtio?

Reproducable with high load in VM, effect: 120 hung_task_timeout seen on console, after a loop ala:
"while true; do apt-get install -y ntp libopts25; apt-get remove -y remove ntp libopts25; done"
+ parallel executed:
"spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat"

The session with the loop gets stuck. The spew-test is still executable, though?!
Attached is a logfile with some debug-stuff enabled.

Some timestamps from observations:

14:18:29 => start loop
14:19:00 => start spew
~ 14:19:50 => loop stuck/no output
14:20:24 => spew stopped
14:23:05 => "120 sec" message on console
14:23:31 => tried to kill dpkg/apt
14:25:00 => "halt -p" -> qemu-session is stuck, had to kill process with SIGKILL

Reproducable in lab with ceph-0.56.6-26... latest bobtail.

Hopefully not forgot s/t.

Best regards,

Oliver.

760_root.log.xz (4.79 MB) Oliver Francke, 08/09/2013 02:07 AM

760_wo_rbd_cache.log.xz (3.02 MB) Oliver Francke, 08/13/2013 02:01 AM

Associated revisions

Revision 7a52e2ff (diff)
Added by Josh Durgin over 10 years ago

librados: fix locking for AioCompletionImpl refcounting

Add an already-locked helper so that C_Aio{Safe,Complete} can
increment the reference count when their caller holds the
lock. C_AioCompleteAndSafe's caller is not holding the lock, so call
regular get() to ensure no racing updates can occur.

This eliminates all direct manipulations of AioCompletionImpl->ref,
and makes the necessary locking clear.

The only place C_AioCompleteAndSafe is used is in handling
aio_flush_async(). This could cause a missing completion.

Refs: #5919
Signed-off-by: Josh Durgin <>
Reviewed-by: Sage Weil <>
Tested-by: Oliver Francke <>

Revision 16ed0b9a (diff)
Added by Sage Weil over 10 years ago

librados: fix async aio completion wakeup

For aio flush, we register a wait on the most recent write. The write
completion code, however, was only waking the waiter if they were waiting
on that write, without regard to previous writes (completed or not).
For example, we might have 6 and 7 outstanding and wait on 7. If they
finish in order all is well, but if 7 finishes first we do the flush
completion early. Similarly, if we

- start 6
- start 7
- finish 7
- flush; wait on 7
- finish 6

we can hang forever.

Fix by doing any completions that are prior to the oldest pending write in
the aio write completion handler.

Refs: #5919

Signed-off-by: Sage Weil <>
Reviewed-by: Josh Durgin <>
Tested-by: Oliver Francke <>

Revision 29ae033b (diff)
Added by Josh Durgin over 10 years ago

librados: fix locking for AioCompletionImpl refcounting

Add an already-locked helper so that C_Aio{Safe,Complete} can
increment the reference count when their caller holds the
lock. C_AioCompleteAndSafe's caller is not holding the lock, so call
regular get() to ensure no racing updates can occur.

This eliminates all direct manipulations of AioCompletionImpl->ref,
and makes the necessary locking clear.

The only place C_AioCompleteAndSafe is used is in handling
aio_flush_async(). This could cause a missing completion.

Refs: #5919
Signed-off-by: Josh Durgin <>
Reviewed-by: Sage Weil <>
Tested-by: Oliver Francke <>
(cherry picked from commit 7a52e2ff5025754f3040eff3fc52d4893cafc389)

Revision b9d1bf51 (diff)
Added by Sage Weil over 10 years ago

librados: fix async aio completion wakeup

For aio flush, we register a wait on the most recent write. The write
completion code, however, was only waking the waiter if they were waiting
on that write, without regard to previous writes (completed or not).
For example, we might have 6 and 7 outstanding and wait on 7. If they
finish in order all is well, but if 7 finishes first we do the flush
completion early. Similarly, if we

- start 6
- start 7
- finish 7
- flush; wait on 7
- finish 6

we can hang forever.

Fix by doing any completions that are prior to the oldest pending write in
the aio write completion handler.

Refs: #5919

Signed-off-by: Sage Weil <>
Reviewed-by: Josh Durgin <>
Tested-by: Oliver Francke <>
(cherry picked from commit 16ed0b9af8bc08c7dabead1c1a7c1a22b1fb02fb)

Revision ef731dfc (diff)
Added by Josh Durgin over 10 years ago

librados: fix locking for AioCompletionImpl refcounting

Add an already-locked helper so that C_Aio{Safe,Complete} can
increment the reference count when their caller holds the
lock. C_AioCompleteAndSafe's caller is not holding the lock, so call
regular get() to ensure no racing updates can occur.

This eliminates all direct manipulations of AioCompletionImpl->ref,
and makes the necessary locking clear.

The only place C_AioCompleteAndSafe is used is in handling
aio_flush_async(). This could cause a missing completion.

Refs: #5919
Signed-off-by: Josh Durgin <>
Reviewed-by: Sage Weil <>
Tested-by: Oliver Francke <>
(cherry picked from commit 7a52e2ff5025754f3040eff3fc52d4893cafc389)

Revision 50698d18 (diff)
Added by Sage Weil over 10 years ago

librados: fix async aio completion wakeup

For aio flush, we register a wait on the most recent write. The write
completion code, however, was only waking the waiter if they were waiting
on that write, without regard to previous writes (completed or not).
For example, we might have 6 and 7 outstanding and wait on 7. If they
finish in order all is well, but if 7 finishes first we do the flush
completion early. Similarly, if we

- start 6
- start 7
- finish 7
- flush; wait on 7
- finish 6

we can hang forever.

Fix by doing any completions that are prior to the oldest pending write in
the aio write completion handler.

Refs: #5919

Signed-off-by: Sage Weil <>
Reviewed-by: Josh Durgin <>
Tested-by: Oliver Francke <>
(cherry picked from commit 16ed0b9af8bc08c7dabead1c1a7c1a22b1fb02fb)

Revision 6cf05375 (diff)
Added by Josh Durgin over 10 years ago

librados: fix locking for AioCompletionImpl refcounting

Add an already-locked helper so that C_Aio{Safe,Complete} can
increment the reference count when their caller holds the
lock. C_AioCompleteAndSafe's caller is not holding the lock, so call
regular get() to ensure no racing updates can occur.

This eliminates all direct manipulations of AioCompletionImpl->ref,
and makes the necessary locking clear.

The only place C_AioCompleteAndSafe is used is in handling
aio_flush_async(). This could cause a missing completion.

Refs: #5919
Signed-off-by: Josh Durgin <>
Reviewed-by: Sage Weil <>
Tested-by: Oliver Francke <>
(cherry picked from commit 7a52e2ff5025754f3040eff3fc52d4893cafc389)

Revision 1670a73b (diff)
Added by Sage Weil over 10 years ago

librados: fix async aio completion wakeup

For aio flush, we register a wait on the most recent write. The write
completion code, however, was only waking the waiter if they were waiting
on that write, without regard to previous writes (completed or not).
For example, we might have 6 and 7 outstanding and wait on 7. If they
finish in order all is well, but if 7 finishes first we do the flush
completion early. Similarly, if we

- start 6
- start 7
- finish 7
- flush; wait on 7
- finish 6

we can hang forever.

Fix by doing any completions that are prior to the oldest pending write in
the aio write completion handler.

Refs: #5919

Signed-off-by: Sage Weil <>
Reviewed-by: Josh Durgin <>
Tested-by: Oliver Francke <>
(cherry picked from commit 16ed0b9af8bc08c7dabead1c1a7c1a22b1fb02fb)

History

#1 Updated by Sage Weil over 10 years ago

  • Priority changed from Normal to Urgent

#2 Updated by Josh Durgin over 10 years ago

The log shows one aio_flush (that started at 2013-08-08 14:19:36.425555) never completing. This log is without caching, so it's probably a bug in librados' async flush handling. When caching is used, the code path is quite different, but there could be a similar bug in the caching module.

Could you get logs (debug rbd=20, debug objectcacher=30, debug ms=1) of this happening with caching enabled so we can see if it's the same kind of bug there?

#3 Updated by Oliver Francke over 10 years ago

Hi Josh,

even under high ceph-load I was not able to break things down with rbd_cache=true after 2 hours, whereas it fails after couple of minutes without. So there is a strong indication... ;)

Anything else I could/should try?

Regards,

Oliver.

#4 Updated by Josh Durgin over 10 years ago

Thanks Oliver, that's good to hear. If you use cache=writethrough instead of no cache, it will have the same write properties (waiting for them to go to the cluster) but should avoid this bug.

One more log would help narrow it down - with cache off, logs of it happening with debug ms = 1, debug rados = 20, debug rbd = 20. Thanks!

#5 Updated by Josh Durgin over 10 years ago

I found one bug - there's a fix on top of the current bobtail branch in wip-librados-aio-flush. Could you see if that fixes the hangs?

#6 Updated by Oliver Francke over 10 years ago

Hi Josh,

update/upgrade our cluster with ceph version 0.56.6-28-gf07447f (f07447f28c6e468c1dc76149f51d7958b10482c2), no luck. I attach the requested logfile with debug_rados. Let me know, if you need some more stuff...

The complete device/drive section looks like the following:

-device virtio-blk-pci,drive=virtio0 -drive format=raw,file=rbd:123/vm-760-disk-1.rbd:rbd_cache=false:rbd_cache_size=33554432:rbd_cache_max_dirty=16777216:rbd_cache_target_dirty=8388608:log_file=/home/fcms/760_wo_rbd_cache.log:admin_socket=/home/fcms/760_root.asok:debug_ms=1:debug_rbd=20:debug_rados=20:debug_objectcacher=0,cache=writeback,if=none,id=virtio0,media=disk,index=0

Thnx n regards,

Oliver.

#7 Updated by Sage Weil over 10 years ago

  • Status changed from New to 7
  • Assignee changed from Josh Durgin to Sage Weil
  • Source changed from other to Community (user)

Hi Oliver,

Okay, I think we spotted the (other) bug. I pushed another patch to the same branch; can you try again?

Thanks!

#8 Updated by Oliver Francke over 10 years ago

Hi Josh, Sage,

it looks very good right now, could not get it to break, even with 4 more VM's producing I/O, deep-scrub, some OSD-killing and remapping/backfilling ( talking about 3 nodes small lab-cluster ;) )

Well, that means we should be able to deploy this load into productive environment without any hassle?

Thanks very much for that fix...

Oliver.

#9 Updated by Sage Weil over 10 years ago

  • Status changed from 7 to Pending Backport

Oliver Francke wrote:

Hi Josh, Sage,

it looks very good right now, could not get it to break, even with 4 more VM's producing I/O, deep-scrub, some OSD-killing and remapping/backfilling ( talking about 3 nodes small lab-cluster ;) )

Well, that means we should be able to deploy this load into productive environment without any hassle?

Thanks very much for that fix...

Oliver.

Excellent! Yeah go ahead and use it. I'll get the fix in mainline and backported to bobtail proper.

#10 Updated by Sage Weil over 10 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF