Project

General

Profile

Actions

Bug #5919

closed

qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process

Added by Oliver Francke over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

we had a number of tickets raising, where users reported problems with latest debian-7.[01] and kernel 3.2.x/ Ubuntu 12 LTS and 3.2.0-51-amd in their VM's.
Problem currently observed on qemu with 1.4.0 and onwards incl. latest qemu-1.6.0-rc2.
No problem with upgraded kernel 3.8. for example.
No problem with qemu-1.2.2.
No problem with qcow2.
Problem there with rbd_cache=false/true, aio=native/none, cache=writeback/none.

Some brave assumption: s/t broke with RBD-cache-aio/async-patch triggered by broken client kernel 3.2 handling virtio?

Reproducable with high load in VM, effect: 120 hung_task_timeout seen on console, after a loop ala:
"while true; do apt-get install -y ntp libopts25; apt-get remove -y remove ntp libopts25; done"
+ parallel executed:
"spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat"

The session with the loop gets stuck. The spew-test is still executable, though?!
Attached is a logfile with some debug-stuff enabled.

Some timestamps from observations:

14:18:29 => start loop
14:19:00 => start spew
~ 14:19:50 => loop stuck/no output
14:20:24 => spew stopped
14:23:05 => "120 sec" message on console
14:23:31 => tried to kill dpkg/apt
14:25:00 => "halt -p" -> qemu-session is stuck, had to kill process with SIGKILL

Reproducable in lab with ceph-0.56.6-26... latest bobtail.

Hopefully not forgot s/t.

Best regards,

Oliver.


Files

760_root.log.xz (4.79 MB) 760_root.log.xz Oliver Francke, 08/09/2013 02:07 AM
760_wo_rbd_cache.log.xz (3.02 MB) 760_wo_rbd_cache.log.xz Oliver Francke, 08/13/2013 02:01 AM
Actions

Also available in: Atom PDF