Project

General

Profile

Actions

Bug #2420

closed

ceph crash while under iogen load

Added by Dan Mick almost 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

was running a cluster with three nodes, one OSD and one monitor per node, and one mds.
cephfs was mounted on /tmp/osd (not an OSD data directory), and iogen used to generate load.

Here's the kernel log oops backtrace, preceded by a bunch of libceph errors that may or may not be relevant:

May 11 11:47:55 plana06 kernel: [1202628.854975] libceph: get_reply unknown tid 446 from osd1
May 11 11:47:55 plana06 kernel: [1202628.855189] libceph: get_reply unknown tid 447 from osd0
May 11 11:47:55 plana06 kernel: [1202628.867212] libceph: get_reply unknown tid 432 from osd2
May 11 11:47:55 plana06 kernel: [1202628.867461] libceph: get_reply unknown tid 433 from osd2
May 11 11:47:55 plana06 kernel: [1202628.867535] libceph: get_reply unknown tid 434 from osd2
May 11 11:47:56 plana06 kernel: [1202630.759544] libceph: get_reply unknown tid 654 from osd2
May 11 11:47:58 plana06 kernel: [1202632.091376] libceph: get_reply unknown tid 1094 from osd0
May 11 11:47:58 plana06 kernel: [1202632.091689] libceph: get_reply unknown tid 1096 from osd0
May 11 11:47:58 plana06 kernel: [1202632.131226] libceph: get_reply unknown tid 1122 from osd0
May 11 11:47:58 plana06 kernel: [1202632.170940] libceph: get_reply unknown tid 1137 from osd0
May 11 11:47:59 plana06 kernel: [1202633.099649] libceph: get_reply unknown tid 1488 from osd1
May 11 11:47:59 plana06 kernel: [1202633.229280] libceph: get_reply unknown tid 1566 from osd1
May 11 11:47:59 plana06 kernel: [1202633.229326] libceph: get_reply unknown tid 1568 from osd1
May 11 11:47:59 plana06 kernel: [1202633.309442] libceph: get_reply unknown tid 1604 from osd1
May 11 11:48:00 plana06 kernel: [1202634.148096] libceph: get_reply unknown tid 1937 from osd2
May 11 11:48:00 plana06 kernel: [1202634.228965] libceph: get_reply unknown tid 1949 from osd2
May 11 11:48:00 plana06 kernel: [1202634.320608] libceph: get_reply unknown tid 1966 from osd1
May 11 11:48:03 plana06 kernel: [1202636.793645] libceph: get_reply unknown tid 2589 from osd2
May 11 11:48:03 plana06 kernel: [1202636.904600] libceph: get_reply unknown tid 2590 from osd2
May 11 11:48:03 plana06 kernel: [1202636.933111] libceph: get_reply unknown tid 2612 from osd1
May 11 11:48:03 plana06 kernel: [1202637.452354] libceph: get_reply unknown tid 2828 from osd2
May 11 11:48:04 plana06 kernel: [1202637.904853] libceph: get_reply unknown tid 2978 from osd2
May 11 11:48:04 plana06 kernel: [1202637.941512] libceph: get_reply unknown tid 2984 from osd2
May 11 11:48:04 plana06 kernel: [1202638.021499] libceph: get_reply unknown tid 3013 from osd2
May 11 11:48:04 plana06 kernel: [1202638.550525] libceph: get_reply unknown tid 3259 from osd2
May 11 11:48:05 plana06 kernel: [1202638.910088] libceph: get_reply unknown tid 3388 from osd2
May 11 11:48:05 plana06 kernel: [1202638.949971] libceph: get_reply unknown tid 3393 from osd2
May 11 11:48:05 plana06 kernel: [1202639.029795] libceph: get_reply unknown tid 3415 from osd2
May 11 11:48:05 plana06 kernel: [1202639.189385] libceph: get_reply unknown tid 3458 from osd2
May 11 11:48:05 plana06 kernel: [1202639.608490] libceph: get_reply unknown tid 3633 from osd1
May 11 11:48:06 plana06 kernel: [1202639.937816] libceph: get_reply unknown tid 3830 from osd1
May 11 11:48:06 plana06 kernel: [1202639.938033] libceph: get_reply unknown tid 3832 from osd1
May 11 11:48:06 plana06 kernel: [1202640.072900] libceph: get_reply unknown tid 3904 from osd0
May 11 11:48:06 plana06 kernel: [1202640.251197] libceph: get_reply unknown tid 3986 from osd1
May 11 11:48:06 plana06 kernel: [1202640.706979] libceph: get_reply unknown tid 4199 from osd2
May 11 11:48:07 plana06 kernel: [1202640.906596] libceph: get_reply unknown tid 4240 from osd2
May 11 11:48:07 plana06 kernel: [1202640.946611] libceph: get_reply unknown tid 4258 from osd2
May 11 11:48:07 plana06 kernel: [1202641.415333] libceph: get_reply unknown tid 4448 from osd1
May 11 11:48:08 plana06 kernel: [1202641.988371] libceph: get_reply unknown tid 4676 from osd1
May 11 11:48:08 plana06 kernel: [1202642.043649] libceph: get_reply unknown tid 4675 from osd1
May 11 11:48:16 plana06 kernel: [1202650.127421] libceph: get_reply unknown tid 4871 from osd2
May 11 11:48:16 plana06 kernel: [1202650.127628] libceph: get_reply unknown tid 4927 from osd2
May 11 11:48:16 plana06 kernel: [1202650.143849] libceph: get_reply unknown tid 4988 from osd2
May 11 11:48:16 plana06 kernel: [1202650.151325] libceph: get_reply unknown tid 5004 from osd2
May 11 11:48:17 plana06 kernel: [1202651.422062] libceph: get_reply unknown tid 5012 from osd2
May 11 11:48:18 plana06 kernel: [1202652.436495] libceph: get_reply unknown tid 5314 from osd2
May 11 11:48:18 plana06 kernel: [1202652.510486] libceph: get_reply unknown tid 5330 from osd2
May 11 11:48:18 plana06 kernel: [1202652.510819] libceph: get_reply unknown tid 5333 from osd2
May 11 11:48:18 plana06 kernel: [1202652.511366] libceph: get_reply unknown tid 5334 from osd2
May 11 11:48:19 plana06 kernel: [1202653.504535] libceph: get_reply unknown tid 5599 from osd2
May 11 11:48:19 plana06 kernel: [1202653.694330] libceph: get_reply unknown tid 5716 from osd2
May 11 11:48:20 plana06 kernel: [1202653.803445] libceph: get_reply unknown tid 5771 from osd0
May 11 11:48:21 plana06 kernel: [1202654.812336] libceph: get_reply unknown tid 6173 from osd2
May 11 11:48:21 plana06 kernel: [1202654.812962] libceph: get_reply unknown tid 6177 from osd2
May 11 11:48:21 plana06 kernel: [1202654.852295] libceph: get_reply unknown tid 6188 from osd2
May 11 11:48:21 plana06 kernel: [1202654.931985] libceph: get_reply unknown tid 6213 from osd2
May 11 11:48:21 plana06 kernel: [1202655.693355] libceph: get_reply unknown tid 6486 from osd2
May 11 11:48:22 plana06 kernel: [1202655.820641] libceph: get_reply unknown tid 6519 from osd2
May 11 11:48:22 plana06 kernel: [1202655.980366] libceph: get_reply unknown tid 6600 from osd2
May 11 11:48:22 plana06 kernel: [1202656.020189] libceph: get_reply unknown tid 6632 from osd2
May 11 11:48:23 plana06 kernel: [1202656.867354] libceph: get_reply unknown tid 6886 from osd2
May 11 11:48:23 plana06 kernel: [1202657.174115] libceph: get_reply unknown tid 6958 from osd0
May 11 11:48:28 plana06 kernel: [1202662.614339] libceph: get_reply unknown tid 7022 from osd2
May 11 11:48:28 plana06 kernel: [1202662.619470] libceph: get_reply unknown tid 7083 from osd2
May 11 11:48:28 plana06 kernel: [1202662.619494] libceph: get_reply unknown tid 7082 from osd2
May 11 11:48:30 plana06 dhclient: DHCPREQUEST of 10.214.131.34 on eth0 to 10.214.131.39 port 67
May 11 11:48:30 plana06 dhclient: DHCPACK of 10.214.131.34 from 10.214.131.39
May 11 11:48:30 plana06 dhclient: can't create /var/lib/dhcp3/dhclient.eth0.leases: No such file or directory
May 11 11:48:30 plana06 dhclient: bound to 10.214.131.34 -- renewal in 260 seconds.
May 11 11:48:32 plana06 kernel: [1202666.696725] libceph: get_reply unknown tid 7086 from osd2
May 11 11:48:33 plana06 kernel: [1202667.709348] libceph: get_reply unknown tid 7481 from osd0
May 11 11:48:34 plana06 kernel: [1202667.789187] libceph: get_reply unknown tid 7504 from osd0
May 11 11:48:34 plana06 kernel: [1202667.802531] libceph: get_reply unknown tid 7531 from osd2
May 11 11:48:34 plana06 kernel: [1202667.869043] libceph: get_reply unknown tid 7548 from osd0
May 11 11:48:35 plana06 kernel: [1202668.817948] libceph: get_reply unknown tid 7955 from osd2
May 11 11:48:35 plana06 kernel: [1202668.818219] libceph: get_reply unknown tid 7956 from osd2
May 11 11:48:35 plana06 kernel: [1202668.857967] libceph: get_reply unknown tid 7972 from osd2
May 11 11:48:35 plana06 kernel: [1202669.017703] libceph: get_reply unknown tid 8058 from osd2
May 11 11:48:36 plana06 kernel: [1202669.816135] libceph: get_reply unknown tid 8105 from osd0
May 11 11:48:36 plana06 kernel: [1202669.817917] libceph: get_reply unknown tid 8353 from osd0
May 11 11:48:36 plana06 kernel: [1202669.965787] libceph: get_reply unknown tid 8389 from osd1
May 11 11:48:37 plana06 kernel: [1202670.844001] libceph: get_reply unknown tid 8670 from osd0
May 11 11:48:37 plana06 kernel: [1202670.925189] libceph: get_reply unknown tid 8671 from osd0
May 11 11:48:37 plana06 kernel: [1202671.104009] libceph: get_reply unknown tid 8731 from osd2
May 11 11:48:37 plana06 kernel: [1202671.243152] libceph: get_reply unknown tid 8796 from osd0
May 11 11:48:38 plana06 kernel: [1202672.291555] libceph: get_reply unknown tid 9210 from osd0
May 11 11:48:39 plana06 kernel: [1202672.890882] libceph: get_reply unknown tid 9418 from osd2
May 11 11:48:39 plana06 kernel: [1202673.010788] libceph: get_reply unknown tid 9512 from osd2
May 11 11:48:39 plana06 kernel: [1202673.338364] libceph: get_reply unknown tid 9570 from osd2
May 11 11:48:40 plana06 kernel: [1202674.038288] libceph: get_reply unknown tid 9738 from osd0
May 11 11:48:40 plana06 kernel: [1202674.078249] libceph: get_reply unknown tid 9743 from osd0
May 11 11:48:40 plana06 kernel: [1202674.495131] libceph: get_reply unknown tid 9703 from osd2
May 11 11:48:41 plana06 kernel: [1202674.837658] libceph: get_reply unknown tid 9852 from osd2
May 11 11:48:41 plana06 kernel: [1202675.156994] libceph: get_reply unknown tid 9992 from osd2
May 11 11:48:41 plana06 kernel: [1202675.196514] libceph: get_reply unknown tid 10001 from osd0
May 11 11:48:41 plana06 kernel: [1202675.556326] libceph: get_reply unknown tid 10140 from osd2
May 11 11:48:41 plana06 kernel: [1202675.639509] libceph: get_reply unknown tid 10164 from osd2
May 11 11:48:42 plana06 kernel: [1202676.159824] libceph: get_reply unknown tid 10395 from osd0
May 11 11:48:42 plana06 kernel: [1202676.694107] libceph: get_reply unknown tid 10647 from osd1
May 11 11:48:42 plana06 kernel: [1202676.694147] libceph: get_reply unknown tid 10646 from osd1
May 11 11:48:43 plana06 kernel: [1202676.831489] libceph: get_reply unknown tid 10648 from osd1
May 11 11:48:43 plana06 kernel: [1202677.173292] libceph: get_reply unknown tid 10775 from osd2
May 11 11:48:43 plana06 kernel: [1202677.502927] libceph: get_reply unknown tid 10935 from osd2
May 11 11:48:44 plana06 kernel: [1202677.832359] libceph: get_reply unknown tid 11052 from osd2
May 11 11:48:44 plana06 kernel: [1202677.832958] libceph: get_reply unknown tid 11054 from osd2
May 11 11:48:44 plana06 kernel: [1202677.833558] libceph: get_reply unknown tid 11060 from osd2
May 11 11:48:46 plana06 kernel: [1202679.978562] libceph: get_reply unknown tid 11479 from osd2
May 11 11:48:46 plana06 kernel: [1202679.978623] libceph: get_reply unknown tid 11481 from osd2
May 11 11:48:46 plana06 kernel: [1202680.557665] libceph: get_reply unknown tid 11494 from osd2
May 11 11:48:46 plana06 kernel: [1202680.557719] libceph: get_reply unknown tid 11496 from osd2
May 11 11:48:47 plana06 kernel: [1202681.076804] libceph: get_reply unknown tid 11638 from osd2
May 11 11:48:47 plana06 kernel: [1202681.559239] libceph: get_reply unknown tid 11899 from osd1
May 11 11:48:47 plana06 kernel: [1202681.604339] libceph: get_reply unknown tid 11932 from osd0
May 11 11:48:47 plana06 kernel: [1202681.677475] libceph: get_reply unknown tid 11965 from osd1
May 11 11:48:48 plana06 kernel: [1202682.074852] libceph: get_reply unknown tid 12102 from osd1
May 11 11:48:48 plana06 kernel: [1202682.663727] libceph: get_reply unknown tid 12391 from osd1
May 11 11:48:49 plana06 kernel: [1202682.703597] libceph: get_reply unknown tid 12402 from osd1
May 11 11:48:49 plana06 kernel: [1202682.783473] libceph: get_reply unknown tid 12449 from osd1
May 11 11:48:49 plana06 kernel: [1202683.112870] libceph: get_reply unknown tid 12601 from osd1
May 11 11:48:50 plana06 kernel: [1202683.732186] libceph: get_reply unknown tid 12912 from osd2
May 11 11:48:50 plana06 kernel: [1202683.802015] libceph: get_reply unknown tid 12933 from osd2
May 11 11:48:50 plana06 kernel: [1202683.803329] libceph: get_reply unknown tid 12935 from osd2
May 11 11:48:50 plana06 kernel: [1202683.822254] libceph: get_reply unknown tid 12967 from osd0
May 11 11:48:52 plana06 kernel: [1202685.840869] libceph: get_reply unknown tid 13304 from osd2
May 11 11:48:52 plana06 kernel: [1202686.499793] libceph: get_reply unknown tid 13491 from osd2
May 11 11:48:53 plana06 kernel: [1202686.760300] libceph: get_reply unknown tid 13475 from osd1
May 11 11:48:53 plana06 kernel: [1202686.760542] libceph: get_reply unknown tid 13513 from osd1
May 11 11:48:53 plana06 kernel: [1202686.956345] libceph: get_reply unknown tid 13642 from osd1
May 11 11:48:53 plana06 kernel: [1202687.646732] libceph: get_reply unknown tid 13901 from osd0
May 11 11:48:53 plana06 kernel: [1202687.675033] libceph: get_reply unknown tid 13897 from osd1
May 11 11:48:54 plana06 kernel: [1202687.878564] libceph: get_reply unknown tid 14020 from osd1
May 11 11:48:54 plana06 kernel: [1202688.037115] libceph: get_reply unknown tid 14109 from osd2
May 11 11:48:54 plana06 kernel: [1202688.673234] libceph: get_reply unknown tid 14346 from osd1
May 11 11:48:55 plana06 kernel: [1202688.693499] libceph: get_reply unknown tid 14352 from osd2
May 11 11:48:55 plana06 kernel: [1202688.912763] libceph: get_reply unknown tid 14415 from osd1
May 11 11:48:56 plana06 kernel: [1202689.850927] libceph: get_reply unknown tid 14797 from osd0
May 11 11:48:56 plana06 kernel: [1202689.922891] libceph: get_reply unknown tid 14843 from osd1
May 11 11:48:56 plana06 kernel: [1202690.161857] libceph: get_reply unknown tid 14947 from osd1
May 11 11:48:56 plana06 kernel: [1202690.300418] libceph: get_reply unknown tid 14988 from osd1
May 11 11:48:57 plana06 kernel: [1202690.928027] libceph: get_reply unknown tid 15158 from osd1
May 11 11:48:57 plana06 kernel: [1202691.248766] libceph: get_reply unknown tid 15272 from osd1
May 11 11:48:58 plana06 kernel: [1202691.957797] libceph: get_reply unknown tid 15535 from osd2
May 11 11:48:58 plana06 kernel: [1202692.077626] libceph: get_reply unknown tid 15560 from osd2
May 11 11:48:58 plana06 kernel: [1202692.117509] libceph: get_reply unknown tid 15565 from osd2
May 11 11:48:58 plana06 kernel: [1202692.241731] libceph: get_reply unknown tid 15569 from osd2
May 11 11:48:59 plana06 kernel: [1202692.846204] libceph: get_reply unknown tid 15820 from osd2
May 11 11:48:59 plana06 kernel: [1202693.245627] libceph: get_reply unknown tid 15959 from osd2
May 11 11:48:59 plana06 kernel: [1202693.289140] libceph: get_reply unknown tid 15990 from osd2
May 11 11:49:00 plana06 kernel: [1202694.123983] libceph: get_reply unknown tid 16352 from osd2
May 11 11:49:00 plana06 kernel: [1202694.253208] libceph: get_reply unknown tid 16392 from osd0
May 11 11:49:00 plana06 kernel: [1202694.483463] libceph: get_reply unknown tid 16459 from osd2
May 11 11:49:00 plana06 kernel: [1202694.492905] libceph: get_reply unknown tid 16464 from osd0
May 11 11:49:01 plana06 kernel: [1202695.072337] libceph: get_reply unknown tid 16748 from osd2
May 11 11:49:01 plana06 kernel: [1202695.272020] libceph: get_reply unknown tid 16875 from osd2
May 11 11:49:01 plana06 kernel: [1202695.311898] libceph: get_reply unknown tid 16890 from osd2
May 11 11:49:01 plana06 kernel: [1202695.591594] libceph: get_reply unknown tid 16984 from osd2
May 11 11:49:02 plana06 kernel: [1202696.230380] libceph: get_reply unknown tid 17189 from osd2
May 11 11:49:02 plana06 kernel: [1202696.360296] libceph: get_reply unknown tid 17221 from osd2
May 11 11:49:02 plana06 kernel: [1202696.440078] libceph: get_reply unknown tid 17243 from osd2
May 11 11:49:02 plana06 kernel: [1202696.519883] libceph: get_reply unknown tid 17284 from osd2
May 11 11:49:03 plana06 kernel: [1202696.799462] libceph: get_reply unknown tid 17374 from osd2
May 11 11:49:03 plana06 kernel: [1202697.350455] libceph: get_reply unknown tid 17631 from osd0
May 11 11:49:03 plana06 kernel: [1202697.537469] libceph: get_reply unknown tid 17686 from osd0
May 11 11:49:03 plana06 kernel: [1202697.561804] libceph: get_reply unknown tid 17705 from osd2
May 11 11:49:04 plana06 kernel: [1202698.397226] libceph: get_reply unknown tid 17995 from osd1
May 11 11:49:04 plana06 kernel: [1202698.436435] libceph: get_reply unknown tid 18004 from osd2
May 11 11:49:04 plana06 kernel: [1202698.556368] libceph: get_reply unknown tid 18071 from osd2
May 11 11:49:04 plana06 kernel: [1202698.640221] libceph: get_reply unknown tid 18136 from osd2
May 11 11:49:05 plana06 kernel: [1202698.915694] libceph: get_reply unknown tid 18230 from osd2
May 11 11:49:05 plana06 kernel: [1202699.422034] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
May 11 11:49:05 plana06 kernel: [1202699.491734] IP: [<ffffffff81317953>] rb_erase+0xd3/0x320
May 11 11:49:05 plana06 kernel: [1202699.527946] PGD 210b0a067 PUD 21965f067 PMD 0
May 11 11:49:05 plana06 kernel: [1202699.562630] Oops: 0000 [#1] SMP
May 11 11:49:05 plana06 kernel: [1202699.562635] CPU 5
May 11 11:49:05 plana06 kernel: [1202699.562636] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs ceph libceph iptable_filter ip_tables x_tables i7core_edac edac_core psmouse joydev serio_raw dcdbas hed lp parport usbhid hid ixgbe mptsas dca mptscsih mdio mptbase scsi_transport_sas bnx2 btrfs zlib_deflate crc32c libcrc32c
May 11 11:49:05 plana06 kernel: [1202699.562670]
May 11 11:49:05 plana06 kernel: [1202699.562673] Pid: 21029, comm: iogen Not tainted 3.3.0-ceph-00077-g8290c9c #1 Dell Inc. PowerEdge R410/01V648
May 11 11:49:05 plana06 kernel: [1202699.562678] RIP: 0010:[<ffffffff81317953>] [<ffffffff81317953>] rb_erase+0xd3/0x320
May 11 11:49:05 plana06 kernel: [1202699.562685] RSP: 0018:ffff8802216fb9b8 EFLAGS: 00010282
May 11 11:49:05 plana06 kernel: [1202699.562687] RAX: ffff88020d8be808 RBX: ffff8802205d3408 RCX: ffff880220a45809
May 11 11:49:05 plana06 kernel: [1202699.562690] RDX: ffff88020d8be808 RSI: ffff880220bae470 RDI: 0000000000000000
May 11 11:49:05 plana06 kernel: [1202699.562693] RBP: ffff8802216fb9c8 R08: 0000000000000001 R09: 0000000000000001
May 11 11:49:05 plana06 kernel: [1202699.562695] R10: 0000000000000000 R11: 2222222222222222 R12: ffff880220bae470
May 11 11:49:05 plana06 kernel: [1202699.562698] R13: ffff880220bae2e8 R14: ffff880220bae3c0 R15: ffff8802216fbb30
May 11 11:49:05 plana06 kernel: [1202699.562701] FS: 00007fc10cbc1720(0000) GS:ffff8802272a0000(0000) knlGS:0000000000000000
May 11 11:49:05 plana06 kernel: [1202699.562704] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 11 11:49:05 plana06 kernel: [1202699.562706] CR2: 0000000000000010 CR3: 00000002215b4000 CR4: 00000000000006e0
May 11 11:49:05 plana06 kernel: [1202699.562709] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 11 11:49:05 plana06 kernel: [1202699.562712] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 11 11:49:05 plana06 kernel: [1202699.562715] Process iogen (pid: 21029, threadinfo ffff8802216fa000, task ffff880223cd5e80)
May 11 11:49:05 plana06 kernel: [1202699.562717] Stack:
May 11 11:49:05 plana06 kernel: [1202699.562719] ffff88020ca60400 ffff880220bae2e8 ffff8802216fb9e8 ffffffffa01ea640
May 11 11:49:05 plana06 kernel: [1202699.562723] ffff88020ca60400 00000000fffffe00 ffff8802216fba18 ffffffffa01ea861
May 11 11:49:05 plana06 kernel: [1202699.562727] ffff88020ca60400 0000000000000000 ffff88021c7a9068 ffff880220bae2e8
May 11 11:49:05 plana06 kernel: [1202699.562732] Call Trace:
May 11 11:49:05 plana06 kernel: [1202699.562746] [<ffffffffa01ea640>] __unregister_request+0x30/0x1a0 [libceph]
May 11 11:49:05 plana06 kernel: [1202699.562757] [<ffffffffa01ea861>] ceph_osdc_wait_request+0xb1/0x110 [libceph]
May 11 11:49:05 plana06 kernel: [1202699.562768] [<ffffffffa01eb6f3>] ceph_osdc_readpages+0x143/0x1c0 [libceph]
May 11 11:49:05 plana06 kernel: [1202699.562781] [<ffffffffa022c71e>] readpage_nounlock+0xae/0x180 [ceph]
May 11 11:49:05 plana06 kernel: [1202699.562787] [<ffffffff81615dc0>] ? _raw_spin_unlock_irq+0x30/0x40
May 11 11:49:05 plana06 kernel: [1202699.562798] [<ffffffffa022cf3e>] ceph_readpage+0x1e/0x40 [ceph]
May 11 11:49:05 plana06 kernel: [1202699.562804] [<ffffffff81121c2d>] generic_file_aio_read+0x21d/0x760
May 11 11:49:05 plana06 kernel: [1202699.562815] [<ffffffffa0228a43>] ceph_aio_read+0x633/0x880 [ceph]
May 11 11:49:05 plana06 kernel: [1202699.562820] [<ffffffff810aa05d>] ? mark_held_locks+0x7d/0x120
May 11 11:49:05 plana06 kernel: [1202699.562830] [<ffffffffa0228cd5>] ? ceph_llseek+0x45/0x170 [ceph]
May 11 11:49:05 plana06 kernel: [1202699.562835] [<ffffffff816130d2>] ? __mutex_lock_common+0x282/0x3d0
May 11 11:49:05 plana06 kernel: [1202699.562841] [<ffffffff8117ac62>] do_sync_read+0xe2/0x120
May 11 11:49:05 plana06 kernel: [1202699.562845] [<ffffffff81612bf9>] ? __mutex_unlock_slowpath+0xd9/0x180
May 11 11:49:05 plana06 kernel: [1202699.562852] [<ffffffff812a6fcb>] ? security_file_permission+0x8b/0x90
May 11 11:49:05 plana06 kernel: [1202699.562857] [<ffffffff8117b395>] vfs_read+0xc5/0x190
May 11 11:49:05 plana06 kernel: [1202699.562861] [<ffffffff8117b561>] sys_read+0x51/0x90
May 11 11:49:05 plana06 kernel: [1202699.562866] [<ffffffff8161e1a9>] system_call_fastpath+0x16/0x1b
May 11 11:49:05 plana06 kernel: [1202699.562868] Code: 48 09 d1 48 89 0e 41 83 f8 01 74 57 5b 41 5c c9 c3 48 83 c8 01 4c 89 e6 48 89 07 48 83 23 fe 48 89 df e8 61 fd ff ff 48 8b 7b 10 <48> 8b 47 10 48 85 c0 74 09 f6 00 01 0f 84 a9 01 00 00 48 8b 57
May 11 11:49:05 plana06 kernel: [1202699.562903] RIP [<ffffffff81317953>] rb_erase+0xd3/0x320
May 11 11:49:05 plana06 kernel: [1202699.562907] RSP <ffff8802216fb9b8>
May 11 11:49:05 plana06 kernel: [1202699.562909] CR2: 0000000000000010
May 11 11:49:05 plana06 kernel: [1202699.650141] ---[ end trace f92444a4ad0ef44b ]---

Actions #1

Updated by Dan Mick almost 12 years ago

  • Assignee set to Alex Elder
Actions #2

Updated by Ken Franklin almost 12 years ago

The iogen command used was:
sudo iogen -s 2g -b 128k -t 1 -d /mnt/osd -n 5

Actions #3

Updated by Sage Weil almost 12 years ago

  • Status changed from New to Fix Under Review

I think fix-unregister-race will fix this.. Alex or Yehuda, does that make sense?

Hopefully the crash is reproducible so we can verify the fix?

Actions #4

Updated by Ken Franklin almost 12 years ago

it should still be reproducible. I left the configuration up and I was able to reproduce it a couple of days ago.

Actions #5

Updated by Ken Franklin almost 12 years ago

Updated the cluster with the kernel 3.3.0-ceph-00110-g1d4a9bf and ran iogen for 2 hrs, cleared the mounted file store and ran it again for 2 hrs. The original problem did not appear and teh cluster stayed healthy. I'll leave it up if someone wants to look through the logs.

Actions #6

Updated by Sage Weil almost 12 years ago

  • Status changed from Fix Under Review to Resolved

yay!

Actions

Also available in: Atom PDF