Actions
Bug #473
closedKernel panic: ceph_pagelist_append
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
Clustered MDS
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
I was just doing a rsync of kernel.org, debian and ubuntu (simultaneous) and my client got a kernel panic.
The dmesg:
Oct 9 13:26:01 client01 kernel: [366850.430253] ceph: mds0 reconnect start Oct 9 13:26:01 client01 kernel: [366850.430715] general protection fault: 0000 [#1] SMP Oct 9 13:26:01 client01 kernel: [366850.430817] last sysfs file: /sys/module/aes_generic/initstate Oct 9 13:26:01 client01 kernel: [366850.430906] CPU 0 Oct 9 13:26:01 client01 kernel: [366850.430947] Modules linked in: cryptd aes_x86_64 aes_generic ceph libceph libcrc32c ip6table_filter ip6_tables crc32c ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm bridge stp bonding radeon ttm drm_kms_helper ppdev lp drm i2c_algo_bit parport_pc i3000_edac parport led_class shpchp edac_core serio_raw usb_storage floppy e1000e [last unloaded: libceph] Oct 9 13:26:01 client01 kernel: [366850.432179] Oct 9 13:26:01 client01 kernel: [366850.432202] Pid: 4, comm: kworker/0:0 Not tainted 2.6.36-rc5-rbd-20014-g53f0521 #3 PDSMi+/PDSMi Oct 9 13:26:01 client01 kernel: [366850.432334] RIP: 0010:[<ffffffff812c387b>] [<ffffffff812c387b>] memcpy+0xb/0xb0 Oct 9 13:26:01 client01 kernel: [366850.432468] RSP: 0018:ffff880216cdb968 EFLAGS: 00010202 Oct 9 13:26:01 client01 kernel: [366850.432552] RAX: 49081a8490e48000 RBX: ffff8800cffb1fc0 RCX: 0000000000000004 Oct 9 13:26:01 client01 kernel: [366850.432658] RDX: 0000000000000007 RSI: ffff8800cffb1e55 RDI: 49081a8490e48000 Oct 9 13:26:01 client01 kernel: [366850.432763] RBP: ffff880216cdb9a0 R08: 000000000000003d R09: 0000000000000050 Oct 9 13:26:01 client01 kernel: [366850.432868] R10: 0000000000000000 R11: dead000000200200 R12: 0000000000001000 Oct 9 13:26:01 client01 kernel: [366850.432974] R13: 0000000000000027 R14: ffff8800cffb1e55 R15: 6db6db6db6db6db7 Oct 9 13:26:01 client01 kernel: [366850.433080] FS: 0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000 Oct 9 13:26:01 client01 kernel: [366850.433205] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 9 13:26:01 client01 kernel: [366850.433290] CR2: 00007fcf741be000 CR3: 0000000214fcc000 CR4: 00000000000006f0 Oct 9 13:26:01 client01 kernel: [366850.433395] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 9 13:26:01 client01 kernel: [366850.433501] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 9 13:26:01 client01 kernel: [366850.433606] Process kworker/0:0 (pid: 4, threadinfo ffff880216cda000, task ffff880216cc44a0) Oct 9 13:26:01 client01 kernel: [366850.433726] Stack: Oct 9 13:26:01 client01 kernel: [366850.433768] ffffffffa03d7f37 ffff880216cdb980 ffff8801a9e84d28 ffff8800cffb1fc0 Oct 9 13:26:01 client01 kernel: [366850.433929] <0> ffff880216cdba38 ffff8801a9dd1380 ffff880216cdbb60 ffff880216cdba80 Oct 9 13:26:01 client01 kernel: [366850.434118] <0> ffffffffa041010a 00000000ffffffff 000000000000003c ffff8801a9e848bc Oct 9 13:26:01 client01 kernel: [366850.434311] Call Trace: Oct 9 13:26:01 client01 kernel: [366850.434362] [<ffffffffa03d7f37>] ? ceph_pagelist_append+0x157/0x1d0 [libceph] Oct 9 13:26:01 client01 kernel: [366850.434477] [<ffffffffa041010a>] encode_caps_cb+0x2aa/0x3d0 [ceph] Oct 9 13:26:01 client01 kernel: [366850.434585] [<ffffffffa040e73b>] iterate_session_caps+0xbb/0x1b0 [ceph] Oct 9 13:26:01 client01 kernel: [366850.434692] [<ffffffff81100227>] ? __page_cache_alloc+0x87/0x90 Oct 9 13:26:01 client01 kernel: [366850.434790] [<ffffffffa040fe60>] ? encode_caps_cb+0x0/0x3d0 [ceph] Oct 9 13:26:01 client01 kernel: [366850.434898] [<ffffffffa04115bb>] send_mds_reconnect+0x2fb/0x4b0 [ceph] Oct 9 13:26:01 client01 kernel: [366850.435003] [<ffffffffa0411a5d>] ceph_mdsc_handle_map+0x29d/0x450 [ceph] Oct 9 13:26:01 client01 kernel: [366850.435114] [<ffffffffa03fbae9>] extra_mon_dispatch+0x29/0x30 [ceph] Oct 9 13:26:01 client01 kernel: [366850.435215] [<ffffffffa03d8ee0>] dispatch+0x50/0x590 [libceph] Oct 9 13:26:01 client01 kernel: [366850.435310] [<ffffffffa03d7621>] con_work+0x1501/0x1690 [libceph] Oct 9 13:26:01 client01 kernel: [366850.435413] [<ffffffff810591d1>] ? dequeue_entity+0x1b1/0x1f0 Oct 9 13:26:01 client01 kernel: [366850.435506] [<ffffffffa03d6120>] ? con_work+0x0/0x1690 [libceph] Oct 9 13:26:01 client01 kernel: [366850.435605] [<ffffffff81079b73>] process_one_work+0x123/0x440 Oct 9 13:26:01 client01 kernel: [366850.435695] [<ffffffff8107c0c0>] worker_thread+0x170/0x400 Oct 9 13:26:01 client01 kernel: [366850.440206] [<ffffffff8107bf50>] ? worker_thread+0x0/0x400 Oct 9 13:26:01 client01 kernel: [366850.440206] [<ffffffff81080486>] kthread+0x96/0xa0 Oct 9 13:26:01 client01 kernel: [366850.440206] [<ffffffff8100be64>] kernel_thread_helper+0x4/0x10 Oct 9 13:26:01 client01 kernel: [366850.440206] [<ffffffff810803f0>] ? kthread+0x0/0xa0 Oct 9 13:26:01 client01 kernel: [366850.440206] [<ffffffff8100be60>] ? kernel_thread_helper+0x0/0x10 Oct 9 13:26:01 client01 kernel: [366850.440206] Code: 49 89 70 50 19 c0 49 89 70 58 41 c6 40 4c 04 83 e0 fc 83 c0 08 41 88 40 4d c9 c3 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 8b 1e 4c 8b 46 08 4c 89 1f 4c 89 47 08 Oct 9 13:26:01 client01 kernel: [366850.440206] RIP [<ffffffff812c387b>] memcpy+0xb/0xb0 Oct 9 13:26:01 client01 kernel: [366850.440206] RSP <ffff880216cdb968> Oct 9 13:26:01 client01 kernel: [366850.508857] ---[ end trace 1e7299bcbc67da97 ]--- Oct 9 13:26:01 client01 kernel: [366850.514683] Kernel panic - not syncing: Fatal exception Oct 9 13:26:01 client01 kernel: [366850.520220] Pid: 4, comm: kworker/0:0 Tainted: G D 2.6.36-rc5-rbd-20014-g53f0521 #3 Oct 9 13:26:01 client01 kernel: [366850.525839] Call Trace: Oct 9 13:26:01 client01 kernel: [366850.531249] [<ffffffff815943f6>] panic+0x91/0x1a1 Oct 9 13:26:01 client01 kernel: [366850.536552] [<ffffffff810610e5>] ? kmsg_dump+0x145/0x160 Oct 9 13:26:01 client01 kernel: [366850.541804] [<ffffffff81598a8c>] oops_end+0xdc/0xf0 Oct 9 13:26:01 client01 kernel: [366850.546965] [<ffffffff8100ed4b>] die+0x5b/0x90 Oct 9 13:26:01 client01 kernel: [366850.552091] [<ffffffff815985c2>] do_general_protection+0x162/0x170 Oct 9 13:26:01 client01 kernel: [366850.557173] [<ffffffff81597d25>] general_protection+0x25/0x30 Oct 9 13:26:01 client01 kernel: [366850.562249] [<ffffffff812c387b>] ? memcpy+0xb/0xb0 Oct 9 13:26:01 client01 kernel: [366850.567245] [<ffffffffa03d7f37>] ? ceph_pagelist_append+0x157/0x1d0 [libceph] Oct 9 13:26:01 client01 kernel: [366850.572290] [<ffffffffa041010a>] encode_caps_cb+0x2aa/0x3d0 [ceph] Oct 9 13:26:01 client01 kernel: [366850.577325] [<ffffffffa040e73b>] iterate_session_caps+0xbb/0x1b0 [ceph] Oct 9 13:26:01 client01 kernel: [366850.582383] [<ffffffff81100227>] ? __page_cache_alloc+0x87/0x90 Oct 9 13:26:01 client01 kernel: [366850.587360] [<ffffffffa040fe60>] ? encode_caps_cb+0x0/0x3d0 [ceph] Oct 9 13:26:01 client01 kernel: [366850.592337] [<ffffffffa04115bb>] send_mds_reconnect+0x2fb/0x4b0 [ceph] Oct 9 13:26:01 client01 kernel: [366850.597339] [<ffffffffa0411a5d>] ceph_mdsc_handle_map+0x29d/0x450 [ceph] Oct 9 13:26:01 client01 kernel: [366850.602385] [<ffffffffa03fbae9>] extra_mon_dispatch+0x29/0x30 [ceph] Oct 9 13:26:01 client01 kernel: [366850.607380] [<ffffffffa03d8ee0>] dispatch+0x50/0x590 [libceph] Oct 9 13:26:01 client01 kernel: [366850.612317] [<ffffffffa03d7621>] con_work+0x1501/0x1690 [libceph] Oct 9 13:26:01 client01 kernel: [366850.617211] [<ffffffff810591d1>] ? dequeue_entity+0x1b1/0x1f0 Oct 9 13:26:01 client01 kernel: [366850.622055] [<ffffffffa03d6120>] ? con_work+0x0/0x1690 [libceph] Oct 9 13:26:01 client01 kernel: [366850.626811] [<ffffffff81079b73>] process_one_work+0x123/0x440 Oct 9 13:26:01 client01 kernel: [366850.631525] [<ffffffff8107c0c0>] worker_thread+0x170/0x400 Oct 9 13:26:01 client01 kernel: [366850.636214] [<ffffffff8107bf50>] ? worker_thread+0x0/0x400 Oct 9 13:26:01 client01 kernel: [366850.640878] [<ffffffff81080486>] kthread+0x96/0xa0 Oct 9 13:26:01 client01 kernel: [366850.645498] [<ffffffff8100be64>] kernel_thread_helper+0x4/0x10 Oct 9 13:26:01 client01 kernel: [366850.650123] [<ffffffff810803f0>] ? kthread+0x0/0xa0
This client is running the master branch from a few days ago.
It seems that one mds crashed/restarted and it seems so:
root@node13:~# ps aux|grep cmds root 10659 9.3 23.6 1079792 958204 ? Ssl 13:01 7:28 /usr/bin/cmds -i 0 -c /etc/ceph/ceph.conf root 11023 0.0 0.0 7672 820 pts/0 S+ 14:21 0:00 grep --color=auto cmds root@node13:~#
root@node14:~# ps aux|grep cmds root 630 3.0 0.1 97136 8048 ? Ssl 13:28 1:38 /usr/bin/cmds -i 1 -c /etc/ceph/ceph.conf root 680 0.0 0.0 7672 820 pts/0 S+ 14:22 0:00 grep --color=auto cmds root@node14:~#
While it seems that mds.1 restarted after this crash, it might be involved?
It did not manually invoke the restart, could it be that the mds did this by itself?
Actions