Project

General

Profile

Actions

Bug #6139

closed

kernel panic in vms during disk benchmarking

Added by Andrei Mikhailovsky over 10 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi

I am having regular issues with virtual machines running heavy disk io benchmarks.

My ceph setup:

Ceph version is 0.67.2 running on Ubuntu 12.04 with 3.5.0-39-generic kernel. 5 mons, 2 osd servers with 8 osds each. Each osd server has 1 ssd disk to provide journalling for 4 osds. The other 4 osds have journals on the disks. VM host servers are also Ubuntu 12.04 with the same kernel. Version of Qemu is 1.5.0+dfsg-3ubuntu2 backported from Ubuntu 13.04 and librbd1 is 0.67.2-1precise. Libvirt is backported from Ubuntu 13.04 version 1.0.2-0ubuntu11.13.04.1~precise1~ppa1.

I am using CloudStack with RBD primary storage. By default it uses virtio drivers and cache=none when starting virtual machines.

I am using phoronix-test-suite for benchmarking vm disk throughput by running "phoronix-test-suite benchmark pts/disk" command with the default settings. I've managed to complete the test only once out of 5 runs I've tried. Four times I had kernel panic in the console. I can't find the kernel panic message in the logs - i only have two screenshot (attached).

During the benchmark test I am also seeing these messages (not sure if they are related to the kernel panic):

Aug 27 08:05:39 Ubuntu-ceph-ubuntu-host-1 kernel: [31542.695992] ata2: lost interrupt (Status 0x58)
Aug 27 08:05:40 Ubuntu-ceph-ubuntu-host-1 kernel: [31542.700047] ata2: drained 65536 bytes to clear DRQ
Aug 27 08:05:40 Ubuntu-ceph-ubuntu-host-1 kernel: [31542.967579] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 27 08:05:40 Ubuntu-ceph-ubuntu-host-1 kernel: [31542.985443] sr 1:0:0:0: CDB:
Aug 27 08:05:40 Ubuntu-ceph-ubuntu-host-1 kernel: [31542.985453] Get event status notification: 4a 01 00 00 10 00 00 00 08 00
Aug 27 08:05:40 Ubuntu-ceph-ubuntu-host-1 kernel: [31542.985459] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
Aug 27 08:05:40 Ubuntu-ceph-ubuntu-host-1 kernel: [31542.985459] res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Aug 27 08:05:40 Ubuntu-ceph-ubuntu-host-1 kernel: [31543.004340] ata2.00: status: { DRDY }
Aug 27 08:05:40 Ubuntu-ceph-ubuntu-host-1 kernel: [31543.014190] ata2: soft resetting link
Aug 27 08:05:40 Ubuntu-ceph-ubuntu-host-1 kernel: [31543.168974] ata2.00: configured for MWDMA2
Aug 27 08:05:40 Ubuntu-ceph-ubuntu-host-1 kernel: [31543.169208] ata2: EH complete
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31668.762971] ata2: lost interrupt (Status 0x58)
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31668.766914] ata2: drained 65536 bytes to clear DRQ
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31669.033942] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31669.056926] sr 1:0:0:0: CDB:
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31669.056926] Get event status notification: 4a 01 00 00 10 00 00 00 08 00
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31669.056926] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31669.056926] res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31669.080817] ata2.00: status: { DRDY }
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31669.092873] ata2: soft resetting link
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31669.248961] ata2.00: configured for MWDMA2
Aug 27 08:07:46 Ubuntu-ceph-ubuntu-host-1 kernel: [31669.249195] ata2: EH complete
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31804.587971] ata2: lost interrupt (Status 0x58)
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31804.591971] ata2: drained 65536 bytes to clear DRQ
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31804.870048] ata2.00: limiting speed to MWDMA1:PIO4
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31804.870054] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31804.896136] sr 1:0:0:0: CDB:
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31804.896136] Get event status notification: 4a 01 00 00 10 00 00 00 08 00
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31804.897054] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31804.897054] res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31804.924657] ata2.00: status: { DRDY }
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31804.940617] ata2: soft resetting link
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31805.097018] ata2.00: configured for MWDMA1
Aug 27 08:10:02 Ubuntu-ceph-ubuntu-host-1 kernel: [31805.097257] ata2: EH complete

Could you please let me know how to obtain more information for debugging this issue.

P.S. I've not seen kernel panics when running vms on 0.61.7 release using the same hardware. I've seen hang tasks messages, but the benchmarks would always complete without crashing the vm.


Files

ceph-vm-panic-0672.png (249 KB) ceph-vm-panic-0672.png Andrei Mikhailovsky, 08/28/2013 05:39 AM
ceph-kernel-panic-ubuntu-vm.png (263 KB) ceph-kernel-panic-ubuntu-vm.png Andrei Mikhailovsky, 08/28/2013 05:39 AM
ceph-debian-crash-20141227.png (108 KB) ceph-debian-crash-20141227.png Nico Schottelius, 12/27/2014 09:22 AM
ceph-redmine-20141227.png (112 KB) ceph-redmine-20141227.png Nico Schottelius, 12/27/2014 09:22 AM
Actions

Also available in: Atom PDF