Bug #2420
closedceph crash while under iogen load
0%
Description
ken.franklin@inktank.com was running a cluster with three nodes, one OSD and one monitor per node, and one mds.
cephfs was mounted on /tmp/osd (not an OSD data directory), and iogen used to generate load.
Here's the kernel log oops backtrace, preceded by a bunch of libceph errors that may or may not be relevant:
May 11 11:47:55 plana06 kernel: [1202628.854975] libceph: get_reply unknown tid 446 from osd1
May 11 11:47:55 plana06 kernel: [1202628.855189] libceph: get_reply unknown tid 447 from osd0
May 11 11:47:55 plana06 kernel: [1202628.867212] libceph: get_reply unknown tid 432 from osd2
May 11 11:47:55 plana06 kernel: [1202628.867461] libceph: get_reply unknown tid 433 from osd2
May 11 11:47:55 plana06 kernel: [1202628.867535] libceph: get_reply unknown tid 434 from osd2
May 11 11:47:56 plana06 kernel: [1202630.759544] libceph: get_reply unknown tid 654 from osd2
May 11 11:47:58 plana06 kernel: [1202632.091376] libceph: get_reply unknown tid 1094 from osd0
May 11 11:47:58 plana06 kernel: [1202632.091689] libceph: get_reply unknown tid 1096 from osd0
May 11 11:47:58 plana06 kernel: [1202632.131226] libceph: get_reply unknown tid 1122 from osd0
May 11 11:47:58 plana06 kernel: [1202632.170940] libceph: get_reply unknown tid 1137 from osd0
May 11 11:47:59 plana06 kernel: [1202633.099649] libceph: get_reply unknown tid 1488 from osd1
May 11 11:47:59 plana06 kernel: [1202633.229280] libceph: get_reply unknown tid 1566 from osd1
May 11 11:47:59 plana06 kernel: [1202633.229326] libceph: get_reply unknown tid 1568 from osd1
May 11 11:47:59 plana06 kernel: [1202633.309442] libceph: get_reply unknown tid 1604 from osd1
May 11 11:48:00 plana06 kernel: [1202634.148096] libceph: get_reply unknown tid 1937 from osd2
May 11 11:48:00 plana06 kernel: [1202634.228965] libceph: get_reply unknown tid 1949 from osd2
May 11 11:48:00 plana06 kernel: [1202634.320608] libceph: get_reply unknown tid 1966 from osd1
May 11 11:48:03 plana06 kernel: [1202636.793645] libceph: get_reply unknown tid 2589 from osd2
May 11 11:48:03 plana06 kernel: [1202636.904600] libceph: get_reply unknown tid 2590 from osd2
May 11 11:48:03 plana06 kernel: [1202636.933111] libceph: get_reply unknown tid 2612 from osd1
May 11 11:48:03 plana06 kernel: [1202637.452354] libceph: get_reply unknown tid 2828 from osd2
May 11 11:48:04 plana06 kernel: [1202637.904853] libceph: get_reply unknown tid 2978 from osd2
May 11 11:48:04 plana06 kernel: [1202637.941512] libceph: get_reply unknown tid 2984 from osd2
May 11 11:48:04 plana06 kernel: [1202638.021499] libceph: get_reply unknown tid 3013 from osd2
May 11 11:48:04 plana06 kernel: [1202638.550525] libceph: get_reply unknown tid 3259 from osd2
May 11 11:48:05 plana06 kernel: [1202638.910088] libceph: get_reply unknown tid 3388 from osd2
May 11 11:48:05 plana06 kernel: [1202638.949971] libceph: get_reply unknown tid 3393 from osd2
May 11 11:48:05 plana06 kernel: [1202639.029795] libceph: get_reply unknown tid 3415 from osd2
May 11 11:48:05 plana06 kernel: [1202639.189385] libceph: get_reply unknown tid 3458 from osd2
May 11 11:48:05 plana06 kernel: [1202639.608490] libceph: get_reply unknown tid 3633 from osd1
May 11 11:48:06 plana06 kernel: [1202639.937816] libceph: get_reply unknown tid 3830 from osd1
May 11 11:48:06 plana06 kernel: [1202639.938033] libceph: get_reply unknown tid 3832 from osd1
May 11 11:48:06 plana06 kernel: [1202640.072900] libceph: get_reply unknown tid 3904 from osd0
May 11 11:48:06 plana06 kernel: [1202640.251197] libceph: get_reply unknown tid 3986 from osd1
May 11 11:48:06 plana06 kernel: [1202640.706979] libceph: get_reply unknown tid 4199 from osd2
May 11 11:48:07 plana06 kernel: [1202640.906596] libceph: get_reply unknown tid 4240 from osd2
May 11 11:48:07 plana06 kernel: [1202640.946611] libceph: get_reply unknown tid 4258 from osd2
May 11 11:48:07 plana06 kernel: [1202641.415333] libceph: get_reply unknown tid 4448 from osd1
May 11 11:48:08 plana06 kernel: [1202641.988371] libceph: get_reply unknown tid 4676 from osd1
May 11 11:48:08 plana06 kernel: [1202642.043649] libceph: get_reply unknown tid 4675 from osd1
May 11 11:48:16 plana06 kernel: [1202650.127421] libceph: get_reply unknown tid 4871 from osd2
May 11 11:48:16 plana06 kernel: [1202650.127628] libceph: get_reply unknown tid 4927 from osd2
May 11 11:48:16 plana06 kernel: [1202650.143849] libceph: get_reply unknown tid 4988 from osd2
May 11 11:48:16 plana06 kernel: [1202650.151325] libceph: get_reply unknown tid 5004 from osd2
May 11 11:48:17 plana06 kernel: [1202651.422062] libceph: get_reply unknown tid 5012 from osd2
May 11 11:48:18 plana06 kernel: [1202652.436495] libceph: get_reply unknown tid 5314 from osd2
May 11 11:48:18 plana06 kernel: [1202652.510486] libceph: get_reply unknown tid 5330 from osd2
May 11 11:48:18 plana06 kernel: [1202652.510819] libceph: get_reply unknown tid 5333 from osd2
May 11 11:48:18 plana06 kernel: [1202652.511366] libceph: get_reply unknown tid 5334 from osd2
May 11 11:48:19 plana06 kernel: [1202653.504535] libceph: get_reply unknown tid 5599 from osd2
May 11 11:48:19 plana06 kernel: [1202653.694330] libceph: get_reply unknown tid 5716 from osd2
May 11 11:48:20 plana06 kernel: [1202653.803445] libceph: get_reply unknown tid 5771 from osd0
May 11 11:48:21 plana06 kernel: [1202654.812336] libceph: get_reply unknown tid 6173 from osd2
May 11 11:48:21 plana06 kernel: [1202654.812962] libceph: get_reply unknown tid 6177 from osd2
May 11 11:48:21 plana06 kernel: [1202654.852295] libceph: get_reply unknown tid 6188 from osd2
May 11 11:48:21 plana06 kernel: [1202654.931985] libceph: get_reply unknown tid 6213 from osd2
May 11 11:48:21 plana06 kernel: [1202655.693355] libceph: get_reply unknown tid 6486 from osd2
May 11 11:48:22 plana06 kernel: [1202655.820641] libceph: get_reply unknown tid 6519 from osd2
May 11 11:48:22 plana06 kernel: [1202655.980366] libceph: get_reply unknown tid 6600 from osd2
May 11 11:48:22 plana06 kernel: [1202656.020189] libceph: get_reply unknown tid 6632 from osd2
May 11 11:48:23 plana06 kernel: [1202656.867354] libceph: get_reply unknown tid 6886 from osd2
May 11 11:48:23 plana06 kernel: [1202657.174115] libceph: get_reply unknown tid 6958 from osd0
May 11 11:48:28 plana06 kernel: [1202662.614339] libceph: get_reply unknown tid 7022 from osd2
May 11 11:48:28 plana06 kernel: [1202662.619470] libceph: get_reply unknown tid 7083 from osd2
May 11 11:48:28 plana06 kernel: [1202662.619494] libceph: get_reply unknown tid 7082 from osd2
May 11 11:48:30 plana06 dhclient: DHCPREQUEST of 10.214.131.34 on eth0 to 10.214.131.39 port 67
May 11 11:48:30 plana06 dhclient: DHCPACK of 10.214.131.34 from 10.214.131.39
May 11 11:48:30 plana06 dhclient: can't create /var/lib/dhcp3/dhclient.eth0.leases: No such file or directory
May 11 11:48:30 plana06 dhclient: bound to 10.214.131.34 -- renewal in 260 seconds.
May 11 11:48:32 plana06 kernel: [1202666.696725] libceph: get_reply unknown tid 7086 from osd2
May 11 11:48:33 plana06 kernel: [1202667.709348] libceph: get_reply unknown tid 7481 from osd0
May 11 11:48:34 plana06 kernel: [1202667.789187] libceph: get_reply unknown tid 7504 from osd0
May 11 11:48:34 plana06 kernel: [1202667.802531] libceph: get_reply unknown tid 7531 from osd2
May 11 11:48:34 plana06 kernel: [1202667.869043] libceph: get_reply unknown tid 7548 from osd0
May 11 11:48:35 plana06 kernel: [1202668.817948] libceph: get_reply unknown tid 7955 from osd2
May 11 11:48:35 plana06 kernel: [1202668.818219] libceph: get_reply unknown tid 7956 from osd2
May 11 11:48:35 plana06 kernel: [1202668.857967] libceph: get_reply unknown tid 7972 from osd2
May 11 11:48:35 plana06 kernel: [1202669.017703] libceph: get_reply unknown tid 8058 from osd2
May 11 11:48:36 plana06 kernel: [1202669.816135] libceph: get_reply unknown tid 8105 from osd0
May 11 11:48:36 plana06 kernel: [1202669.817917] libceph: get_reply unknown tid 8353 from osd0
May 11 11:48:36 plana06 kernel: [1202669.965787] libceph: get_reply unknown tid 8389 from osd1
May 11 11:48:37 plana06 kernel: [1202670.844001] libceph: get_reply unknown tid 8670 from osd0
May 11 11:48:37 plana06 kernel: [1202670.925189] libceph: get_reply unknown tid 8671 from osd0
May 11 11:48:37 plana06 kernel: [1202671.104009] libceph: get_reply unknown tid 8731 from osd2
May 11 11:48:37 plana06 kernel: [1202671.243152] libceph: get_reply unknown tid 8796 from osd0
May 11 11:48:38 plana06 kernel: [1202672.291555] libceph: get_reply unknown tid 9210 from osd0
May 11 11:48:39 plana06 kernel: [1202672.890882] libceph: get_reply unknown tid 9418 from osd2
May 11 11:48:39 plana06 kernel: [1202673.010788] libceph: get_reply unknown tid 9512 from osd2
May 11 11:48:39 plana06 kernel: [1202673.338364] libceph: get_reply unknown tid 9570 from osd2
May 11 11:48:40 plana06 kernel: [1202674.038288] libceph: get_reply unknown tid 9738 from osd0
May 11 11:48:40 plana06 kernel: [1202674.078249] libceph: get_reply unknown tid 9743 from osd0
May 11 11:48:40 plana06 kernel: [1202674.495131] libceph: get_reply unknown tid 9703 from osd2
May 11 11:48:41 plana06 kernel: [1202674.837658] libceph: get_reply unknown tid 9852 from osd2
May 11 11:48:41 plana06 kernel: [1202675.156994] libceph: get_reply unknown tid 9992 from osd2
May 11 11:48:41 plana06 kernel: [1202675.196514] libceph: get_reply unknown tid 10001 from osd0
May 11 11:48:41 plana06 kernel: [1202675.556326] libceph: get_reply unknown tid 10140 from osd2
May 11 11:48:41 plana06 kernel: [1202675.639509] libceph: get_reply unknown tid 10164 from osd2
May 11 11:48:42 plana06 kernel: [1202676.159824] libceph: get_reply unknown tid 10395 from osd0
May 11 11:48:42 plana06 kernel: [1202676.694107] libceph: get_reply unknown tid 10647 from osd1
May 11 11:48:42 plana06 kernel: [1202676.694147] libceph: get_reply unknown tid 10646 from osd1
May 11 11:48:43 plana06 kernel: [1202676.831489] libceph: get_reply unknown tid 10648 from osd1
May 11 11:48:43 plana06 kernel: [1202677.173292] libceph: get_reply unknown tid 10775 from osd2
May 11 11:48:43 plana06 kernel: [1202677.502927] libceph: get_reply unknown tid 10935 from osd2
May 11 11:48:44 plana06 kernel: [1202677.832359] libceph: get_reply unknown tid 11052 from osd2
May 11 11:48:44 plana06 kernel: [1202677.832958] libceph: get_reply unknown tid 11054 from osd2
May 11 11:48:44 plana06 kernel: [1202677.833558] libceph: get_reply unknown tid 11060 from osd2
May 11 11:48:46 plana06 kernel: [1202679.978562] libceph: get_reply unknown tid 11479 from osd2
May 11 11:48:46 plana06 kernel: [1202679.978623] libceph: get_reply unknown tid 11481 from osd2
May 11 11:48:46 plana06 kernel: [1202680.557665] libceph: get_reply unknown tid 11494 from osd2
May 11 11:48:46 plana06 kernel: [1202680.557719] libceph: get_reply unknown tid 11496 from osd2
May 11 11:48:47 plana06 kernel: [1202681.076804] libceph: get_reply unknown tid 11638 from osd2
May 11 11:48:47 plana06 kernel: [1202681.559239] libceph: get_reply unknown tid 11899 from osd1
May 11 11:48:47 plana06 kernel: [1202681.604339] libceph: get_reply unknown tid 11932 from osd0
May 11 11:48:47 plana06 kernel: [1202681.677475] libceph: get_reply unknown tid 11965 from osd1
May 11 11:48:48 plana06 kernel: [1202682.074852] libceph: get_reply unknown tid 12102 from osd1
May 11 11:48:48 plana06 kernel: [1202682.663727] libceph: get_reply unknown tid 12391 from osd1
May 11 11:48:49 plana06 kernel: [1202682.703597] libceph: get_reply unknown tid 12402 from osd1
May 11 11:48:49 plana06 kernel: [1202682.783473] libceph: get_reply unknown tid 12449 from osd1
May 11 11:48:49 plana06 kernel: [1202683.112870] libceph: get_reply unknown tid 12601 from osd1
May 11 11:48:50 plana06 kernel: [1202683.732186] libceph: get_reply unknown tid 12912 from osd2
May 11 11:48:50 plana06 kernel: [1202683.802015] libceph: get_reply unknown tid 12933 from osd2
May 11 11:48:50 plana06 kernel: [1202683.803329] libceph: get_reply unknown tid 12935 from osd2
May 11 11:48:50 plana06 kernel: [1202683.822254] libceph: get_reply unknown tid 12967 from osd0
May 11 11:48:52 plana06 kernel: [1202685.840869] libceph: get_reply unknown tid 13304 from osd2
May 11 11:48:52 plana06 kernel: [1202686.499793] libceph: get_reply unknown tid 13491 from osd2
May 11 11:48:53 plana06 kernel: [1202686.760300] libceph: get_reply unknown tid 13475 from osd1
May 11 11:48:53 plana06 kernel: [1202686.760542] libceph: get_reply unknown tid 13513 from osd1
May 11 11:48:53 plana06 kernel: [1202686.956345] libceph: get_reply unknown tid 13642 from osd1
May 11 11:48:53 plana06 kernel: [1202687.646732] libceph: get_reply unknown tid 13901 from osd0
May 11 11:48:53 plana06 kernel: [1202687.675033] libceph: get_reply unknown tid 13897 from osd1
May 11 11:48:54 plana06 kernel: [1202687.878564] libceph: get_reply unknown tid 14020 from osd1
May 11 11:48:54 plana06 kernel: [1202688.037115] libceph: get_reply unknown tid 14109 from osd2
May 11 11:48:54 plana06 kernel: [1202688.673234] libceph: get_reply unknown tid 14346 from osd1
May 11 11:48:55 plana06 kernel: [1202688.693499] libceph: get_reply unknown tid 14352 from osd2
May 11 11:48:55 plana06 kernel: [1202688.912763] libceph: get_reply unknown tid 14415 from osd1
May 11 11:48:56 plana06 kernel: [1202689.850927] libceph: get_reply unknown tid 14797 from osd0
May 11 11:48:56 plana06 kernel: [1202689.922891] libceph: get_reply unknown tid 14843 from osd1
May 11 11:48:56 plana06 kernel: [1202690.161857] libceph: get_reply unknown tid 14947 from osd1
May 11 11:48:56 plana06 kernel: [1202690.300418] libceph: get_reply unknown tid 14988 from osd1
May 11 11:48:57 plana06 kernel: [1202690.928027] libceph: get_reply unknown tid 15158 from osd1
May 11 11:48:57 plana06 kernel: [1202691.248766] libceph: get_reply unknown tid 15272 from osd1
May 11 11:48:58 plana06 kernel: [1202691.957797] libceph: get_reply unknown tid 15535 from osd2
May 11 11:48:58 plana06 kernel: [1202692.077626] libceph: get_reply unknown tid 15560 from osd2
May 11 11:48:58 plana06 kernel: [1202692.117509] libceph: get_reply unknown tid 15565 from osd2
May 11 11:48:58 plana06 kernel: [1202692.241731] libceph: get_reply unknown tid 15569 from osd2
May 11 11:48:59 plana06 kernel: [1202692.846204] libceph: get_reply unknown tid 15820 from osd2
May 11 11:48:59 plana06 kernel: [1202693.245627] libceph: get_reply unknown tid 15959 from osd2
May 11 11:48:59 plana06 kernel: [1202693.289140] libceph: get_reply unknown tid 15990 from osd2
May 11 11:49:00 plana06 kernel: [1202694.123983] libceph: get_reply unknown tid 16352 from osd2
May 11 11:49:00 plana06 kernel: [1202694.253208] libceph: get_reply unknown tid 16392 from osd0
May 11 11:49:00 plana06 kernel: [1202694.483463] libceph: get_reply unknown tid 16459 from osd2
May 11 11:49:00 plana06 kernel: [1202694.492905] libceph: get_reply unknown tid 16464 from osd0
May 11 11:49:01 plana06 kernel: [1202695.072337] libceph: get_reply unknown tid 16748 from osd2
May 11 11:49:01 plana06 kernel: [1202695.272020] libceph: get_reply unknown tid 16875 from osd2
May 11 11:49:01 plana06 kernel: [1202695.311898] libceph: get_reply unknown tid 16890 from osd2
May 11 11:49:01 plana06 kernel: [1202695.591594] libceph: get_reply unknown tid 16984 from osd2
May 11 11:49:02 plana06 kernel: [1202696.230380] libceph: get_reply unknown tid 17189 from osd2
May 11 11:49:02 plana06 kernel: [1202696.360296] libceph: get_reply unknown tid 17221 from osd2
May 11 11:49:02 plana06 kernel: [1202696.440078] libceph: get_reply unknown tid 17243 from osd2
May 11 11:49:02 plana06 kernel: [1202696.519883] libceph: get_reply unknown tid 17284 from osd2
May 11 11:49:03 plana06 kernel: [1202696.799462] libceph: get_reply unknown tid 17374 from osd2
May 11 11:49:03 plana06 kernel: [1202697.350455] libceph: get_reply unknown tid 17631 from osd0
May 11 11:49:03 plana06 kernel: [1202697.537469] libceph: get_reply unknown tid 17686 from osd0
May 11 11:49:03 plana06 kernel: [1202697.561804] libceph: get_reply unknown tid 17705 from osd2
May 11 11:49:04 plana06 kernel: [1202698.397226] libceph: get_reply unknown tid 17995 from osd1
May 11 11:49:04 plana06 kernel: [1202698.436435] libceph: get_reply unknown tid 18004 from osd2
May 11 11:49:04 plana06 kernel: [1202698.556368] libceph: get_reply unknown tid 18071 from osd2
May 11 11:49:04 plana06 kernel: [1202698.640221] libceph: get_reply unknown tid 18136 from osd2
May 11 11:49:05 plana06 kernel: [1202698.915694] libceph: get_reply unknown tid 18230 from osd2
May 11 11:49:05 plana06 kernel: [1202699.422034] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
May 11 11:49:05 plana06 kernel: [1202699.491734] IP: [<ffffffff81317953>] rb_erase+0xd3/0x320
May 11 11:49:05 plana06 kernel: [1202699.527946] PGD 210b0a067 PUD 21965f067 PMD 0
May 11 11:49:05 plana06 kernel: [1202699.562630] Oops: 0000 [#1] SMP
May 11 11:49:05 plana06 kernel: [1202699.562635] CPU 5
May 11 11:49:05 plana06 kernel: [1202699.562636] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs ceph libceph iptable_filter ip_tables x_tables i7core_edac edac_core psmouse joydev serio_raw dcdbas hed lp parport usbhid hid ixgbe mptsas dca mptscsih mdio mptbase scsi_transport_sas bnx2 btrfs zlib_deflate crc32c libcrc32c
May 11 11:49:05 plana06 kernel: [1202699.562670]
May 11 11:49:05 plana06 kernel: [1202699.562673] Pid: 21029, comm: iogen Not tainted 3.3.0-ceph-00077-g8290c9c #1 Dell Inc. PowerEdge R410/01V648
May 11 11:49:05 plana06 kernel: [1202699.562678] RIP: 0010:[<ffffffff81317953>] [<ffffffff81317953>] rb_erase+0xd3/0x320
May 11 11:49:05 plana06 kernel: [1202699.562685] RSP: 0018:ffff8802216fb9b8 EFLAGS: 00010282
May 11 11:49:05 plana06 kernel: [1202699.562687] RAX: ffff88020d8be808 RBX: ffff8802205d3408 RCX: ffff880220a45809
May 11 11:49:05 plana06 kernel: [1202699.562690] RDX: ffff88020d8be808 RSI: ffff880220bae470 RDI: 0000000000000000
May 11 11:49:05 plana06 kernel: [1202699.562693] RBP: ffff8802216fb9c8 R08: 0000000000000001 R09: 0000000000000001
May 11 11:49:05 plana06 kernel: [1202699.562695] R10: 0000000000000000 R11: 2222222222222222 R12: ffff880220bae470
May 11 11:49:05 plana06 kernel: [1202699.562698] R13: ffff880220bae2e8 R14: ffff880220bae3c0 R15: ffff8802216fbb30
May 11 11:49:05 plana06 kernel: [1202699.562701] FS: 00007fc10cbc1720(0000) GS:ffff8802272a0000(0000) knlGS:0000000000000000
May 11 11:49:05 plana06 kernel: [1202699.562704] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 11 11:49:05 plana06 kernel: [1202699.562706] CR2: 0000000000000010 CR3: 00000002215b4000 CR4: 00000000000006e0
May 11 11:49:05 plana06 kernel: [1202699.562709] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 11 11:49:05 plana06 kernel: [1202699.562712] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 11 11:49:05 plana06 kernel: [1202699.562715] Process iogen (pid: 21029, threadinfo ffff8802216fa000, task ffff880223cd5e80)
May 11 11:49:05 plana06 kernel: [1202699.562717] Stack:
May 11 11:49:05 plana06 kernel: [1202699.562719] ffff88020ca60400 ffff880220bae2e8 ffff8802216fb9e8 ffffffffa01ea640
May 11 11:49:05 plana06 kernel: [1202699.562723] ffff88020ca60400 00000000fffffe00 ffff8802216fba18 ffffffffa01ea861
May 11 11:49:05 plana06 kernel: [1202699.562727] ffff88020ca60400 0000000000000000 ffff88021c7a9068 ffff880220bae2e8
May 11 11:49:05 plana06 kernel: [1202699.562732] Call Trace:
May 11 11:49:05 plana06 kernel: [1202699.562746] [<ffffffffa01ea640>] __unregister_request+0x30/0x1a0 [libceph]
May 11 11:49:05 plana06 kernel: [1202699.562757] [<ffffffffa01ea861>] ceph_osdc_wait_request+0xb1/0x110 [libceph]
May 11 11:49:05 plana06 kernel: [1202699.562768] [<ffffffffa01eb6f3>] ceph_osdc_readpages+0x143/0x1c0 [libceph]
May 11 11:49:05 plana06 kernel: [1202699.562781] [<ffffffffa022c71e>] readpage_nounlock+0xae/0x180 [ceph]
May 11 11:49:05 plana06 kernel: [1202699.562787] [<ffffffff81615dc0>] ? _raw_spin_unlock_irq+0x30/0x40
May 11 11:49:05 plana06 kernel: [1202699.562798] [<ffffffffa022cf3e>] ceph_readpage+0x1e/0x40 [ceph]
May 11 11:49:05 plana06 kernel: [1202699.562804] [<ffffffff81121c2d>] generic_file_aio_read+0x21d/0x760
May 11 11:49:05 plana06 kernel: [1202699.562815] [<ffffffffa0228a43>] ceph_aio_read+0x633/0x880 [ceph]
May 11 11:49:05 plana06 kernel: [1202699.562820] [<ffffffff810aa05d>] ? mark_held_locks+0x7d/0x120
May 11 11:49:05 plana06 kernel: [1202699.562830] [<ffffffffa0228cd5>] ? ceph_llseek+0x45/0x170 [ceph]
May 11 11:49:05 plana06 kernel: [1202699.562835] [<ffffffff816130d2>] ? __mutex_lock_common+0x282/0x3d0
May 11 11:49:05 plana06 kernel: [1202699.562841] [<ffffffff8117ac62>] do_sync_read+0xe2/0x120
May 11 11:49:05 plana06 kernel: [1202699.562845] [<ffffffff81612bf9>] ? __mutex_unlock_slowpath+0xd9/0x180
May 11 11:49:05 plana06 kernel: [1202699.562852] [<ffffffff812a6fcb>] ? security_file_permission+0x8b/0x90
May 11 11:49:05 plana06 kernel: [1202699.562857] [<ffffffff8117b395>] vfs_read+0xc5/0x190
May 11 11:49:05 plana06 kernel: [1202699.562861] [<ffffffff8117b561>] sys_read+0x51/0x90
May 11 11:49:05 plana06 kernel: [1202699.562866] [<ffffffff8161e1a9>] system_call_fastpath+0x16/0x1b
May 11 11:49:05 plana06 kernel: [1202699.562868] Code: 48 09 d1 48 89 0e 41 83 f8 01 74 57 5b 41 5c c9 c3 48 83 c8 01 4c 89 e6 48 89 07 48 83 23 fe 48 89 df e8 61 fd ff ff 48 8b 7b 10 <48> 8b 47 10 48 85 c0 74 09 f6 00 01 0f 84 a9 01 00 00 48 8b 57
May 11 11:49:05 plana06 kernel: [1202699.562903] RIP [<ffffffff81317953>] rb_erase+0xd3/0x320
May 11 11:49:05 plana06 kernel: [1202699.562907] RSP <ffff8802216fb9b8>
May 11 11:49:05 plana06 kernel: [1202699.562909] CR2: 0000000000000010
May 11 11:49:05 plana06 kernel: [1202699.650141] ---[ end trace f92444a4ad0ef44b ]---
Updated by Ken Franklin almost 12 years ago
The iogen command used was:
sudo iogen -s 2g -b 128k -t 1 -d /mnt/osd -n 5
Updated by Sage Weil almost 12 years ago
- Status changed from New to Fix Under Review
I think fix-unregister-race will fix this.. Alex or Yehuda, does that make sense?
Hopefully the crash is reproducible so we can verify the fix?
Updated by Ken Franklin almost 12 years ago
it should still be reproducible. I left the configuration up and I was able to reproduce it a couple of days ago.
Updated by Ken Franklin almost 12 years ago
Updated the cluster with the kernel 3.3.0-ceph-00110-g1d4a9bf and ran iogen for 2 hrs, cleared the mounted file store and ran it again for 2 hrs. The original problem did not appear and teh cluster stayed healthy. I'll leave it up if someone wants to look through the logs.