Bug #5919: qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process - rbd - Ceph

Actions

Copy link

Bug #5919

closed

qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process

Added by Oliver Francke over 10 years ago. Updated over 10 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Sage Weil

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hi,

we had a number of tickets raising, where users reported problems with latest debian-7.[01] and kernel 3.2.x/ Ubuntu 12 LTS and 3.2.0-51-amd in their VM's.
Problem currently observed on qemu with 1.4.0 and onwards incl. latest qemu-1.6.0-rc2.
No problem with upgraded kernel 3.8. for example.
No problem with qemu-1.2.2.
No problem with qcow2.
Problem there with rbd_cache=false/true, aio=native/none, cache=writeback/none.

Some brave assumption: s/t broke with RBD-cache-aio/async-patch triggered by broken client kernel 3.2 handling virtio?

Reproducable with high load in VM, effect: 120 hung_task_timeout seen on console, after a loop ala:
"while true; do apt-get install -y ntp libopts25; apt-get remove -y remove ntp libopts25; done"
+ parallel executed:
"spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat"

The session with the loop gets stuck. The spew-test is still executable, though?!
Attached is a logfile with some debug-stuff enabled.

Some timestamps from observations:

14:18:29 => start loop
14:19:00 => start spew
~ 14:19:50 => loop stuck/no output
14:20:24 => spew stopped
14:23:05 => "120 sec" message on console
14:23:31 => tried to kill dpkg/apt
14:25:00 => "halt -p" -> qemu-session is stuck, had to kill process with SIGKILL

Reproducable in lab with ceph-0.56.6-26... latest bobtail.

Hopefully not forgot s/t.

Best regards,

Oliver.

Files

Download all files

760_root.log.xz (4.79 MB) 760_root.log.xz		Oliver Francke, 08/09/2013 02:07 AM
760_wo_rbd_cache.log.xz (3.02 MB) 760_wo_rbd_cache.log.xz		Oliver Francke, 08/13/2013 02:01 AM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #5919

qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process

Updated by Sage Weil over 10 years ago

Updated by Josh Durgin over 10 years ago

Updated by Oliver Francke over 10 years ago

Updated by Josh Durgin over 10 years ago

Updated by Josh Durgin over 10 years ago

Updated by Oliver Francke over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Oliver Francke over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Sage Weil over 10 years ago