Project

General

Profile

Bug #5812

qemu-kvm guest hangs on disk write with rbd storage

Added by Kevin Fibich over 10 years ago. Updated over 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

qemu-kvm VM with rbd backed storage works until guest tries to write data, then kernel_hung task messages appear.

INFO: task yum:1582 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
yum D 0000000000000000 0 1582 1566 0x00000080
ffff88007ae3bcc8 0000000000000082 000000017ae3bc28 ffff88003755ae68
ffff88003755ae68 ffff88003755ae68 0000000000000cc1 0000000000000005
ffff88007ae27058 ffff88007ae3bfd8 000000000000fb88 ffff88007ae27058
Call Trace:
[<ffffffff8125e808>] ? blk_queue_bio+0x168/0x5a0
[<ffffffff8150e7d5>] schedule_timeout+0x215/0x2e0
[<ffffffff8125cdde>] ? generic_make_request+0x25e/0x530
[<ffffffff8111c575>] ? mempool_alloc_slab+0x15/0x20
[<ffffffff8150e453>] wait_for_common+0x123/0x180
[<ffffffff81063310>] ? default_wake_function+0x0/0x20
[<ffffffff8125d13d>] ? submit_bio+0x8d/0x120
[<ffffffff8150e56d>] wait_for_completion+0x1d/0x20
[<ffffffff812607bd>] __blkdev_issue_flush+0xad/0xe0
[<ffffffff81260806>] blkdev_issue_flush+0x16/0x20
[<ffffffffa008c2c1>] ext4_sync_file+0x191/0x260 [ext4]
[<ffffffff811b1b11>] vfs_fsync_range+0xa1/0xe0
[<ffffffff811b1bbd>] vfs_fsync+0x1d/0x20
[<ffffffff811b1bfe>] do_fsync+0x3e/0x60
[<ffffffff811b1c33>] sys_fdatasync+0x13/0x20
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task jbd2/dm18:376 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/dm18 D 0000000000000000 0 376 2 0x00000000
ffff880037617c20 0000000000000046 ffff880000000000 0000004611119ae4
ffff880037617b90 ffff880037401750 0000000000011374 ffffffffae07f155
ffff88003757e638 ffff880037617fd8 000000000000fb88 ffff88003757e638
Call Trace:
[<ffffffff810a1ac9>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff811b6040>] ? sync_buffer+0x0/0x50
[<ffffffff8150e0f3>] io_schedule+0x73/0xc0
[<ffffffff811b6080>] sync_buffer+0x40/0x50
[<ffffffff8150eaaf>] __wait_on_bit+0x5f/0x90
[<ffffffff811b6040>] ? sync_buffer+0x0/0x50
[<ffffffff8150eb58>] out_of_line_wait_on_bit+0x78/0x90
[<ffffffff81096ce0>] ? wake_bit_function+0x0/0x50
[<ffffffff811b6036>] __wait_on_buffer+0x26/0x30
[<ffffffffa0065761>] jbd2_journal_commit_transaction+0x1131/0x14b0 [jbd2]
[<ffffffff810096f0>] ? __switch_to+0xd0/0x320
[<ffffffff81081a5b>] ? try_to_del_timer_sync+0x7b/0xe0
[<ffffffffa006af38>] kjournald2+0xb8/0x220 [jbd2]
[<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa006ae80>] ? kjournald2+0x0/0x220 [jbd2]
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffff810968a0>] ? kthread+0x0/0xa0
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20

kvm config as well as full vm log and rbd debug are attached

kvm_instance_config.xml View - KVM instance config (3.37 KB) Kevin Fibich, 07/31/2013 08:02 AM

qemu.log View - qemu rbd debug log (7.08 MB) Kevin Fibich, 07/31/2013 08:02 AM

vm_console.log View - VM kernel console log (33 KB) Kevin Fibich, 07/31/2013 08:02 AM

qemu.7z (5.75 MB) Kevin Fibich, 08/02/2013 02:10 PM

History

#1 Updated by Ian Colle over 10 years ago

  • Assignee set to Josh Durgin

#2 Updated by Josh Durgin over 10 years ago

Was your cluster healthy and showing no slow requests? It looks like the vm didn't have rbd caching enabled, so unless librbd lost a callback somehow, osds being overloaded is probably the cause of these warnings. It looks like the qemu log is from a different run, or the guest clock is wrong, since the log shows < 120 seconds of activity after boot.

#3 Updated by Kevin Fibich over 10 years ago

The cluster is healthy and i could not find slow requests in osd logs:

ceph -s
health HEALTH_OK
monmap e1: 1 mons at {0001=192.168.13.2:6789/0}, election epoch 1, quorum 0 0001
osdmap e247: 12 osds: 12 up, 12 in
pgmap v7816: 448 pgs: 448 active+clean; 23492 MB data, 97173 MB used, 11065 GB / 11159 GB avail
mdsmap e1: 0/0/1 up

ceph osd tree
  1. id weight type name up/down reweight
    -1 16 root default
    -2 4 host seed0001
    0 1 osd.0 up 1
    3 1 osd.3 up 1
    1 1 osd.1 up 1
    2 1 osd.2 up 1
    -3 4 host seed9
    6 1 osd.6 up 1
    7 1 osd.7 up 1
    8 1 osd.8 up 1
    9 1 osd.9 up 1
    -4 0 host seed10
    -5 4 host seed11
    5 1 osd.5 DNE
    11 1 osd.11 DNE
    4 1 osd.4 DNE
    10 1 osd.10 DNE
    -6 4 host seed13
    13 1 osd.13 up 1
    12 1 osd.12 up 1
    15 1 osd.15 up 1
    14 1 osd.14 up 1

Also rbd write bench shows acceptable performance:

rbd -p volumes bench-write test --io-size 4096000 --io-threads 8 --io-total 512000000 --io-pattern rand
bench-write io_size 4096000 io_threads 8 bytes 512000000 pattern rand
SEC OPS OPS/SEC BYTES/SEC
1 21 20.69 52470595.65
2 49 24.37 75388840.61
3 77 25.42 82482503.63
4 104 25.88 87656986.70
5 130 25.82 87853128.49
elapsed: 5 ops: 148 ops/sec: 26.71 bytes/sec: 92397657.86

Sorry about the log, i truncated it to 50k lines because it grew very large over time, you can find the complete log attached as 7zip archive.

Also i tried to use qemu with cache='writeback', which resulted in a slow albeit stable guest. "Yum update", which resulted in the hung task messages before, completed successfully but froze multiple times during execution (which does not happen if i use local file storage).

#4 Updated by Josh Durgin over 10 years ago

Since this mostly goes away with caching enabled, I'm guessing this is the same as #5919 - does it still occur with 0.67.1 or 0.61.8?

#5 Updated by Josh Durgin over 10 years ago

  • Status changed from New to Need More Info

#6 Updated by Josh Durgin over 10 years ago

  • Status changed from Need More Info to Closed

Closing due to inactivity - please reopen if it's still an issue.

Also available in: Atom PDF