Project

General

Profile

Actions

Bug #473

closed

Kernel panic: ceph_pagelist_append

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
Clustered MDS
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I was just doing a rsync of kernel.org, debian and ubuntu (simultaneous) and my client got a kernel panic.

The dmesg:

Oct  9 13:26:01 client01 kernel: [366850.430253] ceph: mds0 reconnect start
Oct  9 13:26:01 client01 kernel: [366850.430715] general protection fault: 0000 [#1] SMP 
Oct  9 13:26:01 client01 kernel: [366850.430817] last sysfs file: /sys/module/aes_generic/initstate
Oct  9 13:26:01 client01 kernel: [366850.430906] CPU 0 
Oct  9 13:26:01 client01 kernel: [366850.430947] Modules linked in: cryptd aes_x86_64 aes_generic ceph libceph libcrc32c ip6table_filter ip6_tables crc32c ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm bridge stp bonding radeon ttm drm_kms_helper ppdev lp drm i2c_algo_bit parport_pc i3000_edac parport led_class shpchp edac_core serio_raw usb_storage floppy e1000e [last unloaded: libceph]
Oct  9 13:26:01 client01 kernel: [366850.432179] 
Oct  9 13:26:01 client01 kernel: [366850.432202] Pid: 4, comm: kworker/0:0 Not tainted 2.6.36-rc5-rbd-20014-g53f0521 #3 PDSMi+/PDSMi
Oct  9 13:26:01 client01 kernel: [366850.432334] RIP: 0010:[<ffffffff812c387b>]  [<ffffffff812c387b>] memcpy+0xb/0xb0
Oct  9 13:26:01 client01 kernel: [366850.432468] RSP: 0018:ffff880216cdb968  EFLAGS: 00010202
Oct  9 13:26:01 client01 kernel: [366850.432552] RAX: 49081a8490e48000 RBX: ffff8800cffb1fc0 RCX: 0000000000000004
Oct  9 13:26:01 client01 kernel: [366850.432658] RDX: 0000000000000007 RSI: ffff8800cffb1e55 RDI: 49081a8490e48000
Oct  9 13:26:01 client01 kernel: [366850.432763] RBP: ffff880216cdb9a0 R08: 000000000000003d R09: 0000000000000050
Oct  9 13:26:01 client01 kernel: [366850.432868] R10: 0000000000000000 R11: dead000000200200 R12: 0000000000001000
Oct  9 13:26:01 client01 kernel: [366850.432974] R13: 0000000000000027 R14: ffff8800cffb1e55 R15: 6db6db6db6db6db7
Oct  9 13:26:01 client01 kernel: [366850.433080] FS:  0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000
Oct  9 13:26:01 client01 kernel: [366850.433205] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Oct  9 13:26:01 client01 kernel: [366850.433290] CR2: 00007fcf741be000 CR3: 0000000214fcc000 CR4: 00000000000006f0
Oct  9 13:26:01 client01 kernel: [366850.433395] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct  9 13:26:01 client01 kernel: [366850.433501] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct  9 13:26:01 client01 kernel: [366850.433606] Process kworker/0:0 (pid: 4, threadinfo ffff880216cda000, task ffff880216cc44a0)
Oct  9 13:26:01 client01 kernel: [366850.433726] Stack:
Oct  9 13:26:01 client01 kernel: [366850.433768]  ffffffffa03d7f37 ffff880216cdb980 ffff8801a9e84d28 ffff8800cffb1fc0
Oct  9 13:26:01 client01 kernel: [366850.433929] <0> ffff880216cdba38 ffff8801a9dd1380 ffff880216cdbb60 ffff880216cdba80
Oct  9 13:26:01 client01 kernel: [366850.434118] <0> ffffffffa041010a 00000000ffffffff 000000000000003c ffff8801a9e848bc
Oct  9 13:26:01 client01 kernel: [366850.434311] Call Trace:
Oct  9 13:26:01 client01 kernel: [366850.434362]  [<ffffffffa03d7f37>] ? ceph_pagelist_append+0x157/0x1d0 [libceph]
Oct  9 13:26:01 client01 kernel: [366850.434477]  [<ffffffffa041010a>] encode_caps_cb+0x2aa/0x3d0 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.434585]  [<ffffffffa040e73b>] iterate_session_caps+0xbb/0x1b0 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.434692]  [<ffffffff81100227>] ? __page_cache_alloc+0x87/0x90
Oct  9 13:26:01 client01 kernel: [366850.434790]  [<ffffffffa040fe60>] ? encode_caps_cb+0x0/0x3d0 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.434898]  [<ffffffffa04115bb>] send_mds_reconnect+0x2fb/0x4b0 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.435003]  [<ffffffffa0411a5d>] ceph_mdsc_handle_map+0x29d/0x450 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.435114]  [<ffffffffa03fbae9>] extra_mon_dispatch+0x29/0x30 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.435215]  [<ffffffffa03d8ee0>] dispatch+0x50/0x590 [libceph]
Oct  9 13:26:01 client01 kernel: [366850.435310]  [<ffffffffa03d7621>] con_work+0x1501/0x1690 [libceph]
Oct  9 13:26:01 client01 kernel: [366850.435413]  [<ffffffff810591d1>] ? dequeue_entity+0x1b1/0x1f0
Oct  9 13:26:01 client01 kernel: [366850.435506]  [<ffffffffa03d6120>] ? con_work+0x0/0x1690 [libceph]
Oct  9 13:26:01 client01 kernel: [366850.435605]  [<ffffffff81079b73>] process_one_work+0x123/0x440
Oct  9 13:26:01 client01 kernel: [366850.435695]  [<ffffffff8107c0c0>] worker_thread+0x170/0x400
Oct  9 13:26:01 client01 kernel: [366850.440206]  [<ffffffff8107bf50>] ? worker_thread+0x0/0x400
Oct  9 13:26:01 client01 kernel: [366850.440206]  [<ffffffff81080486>] kthread+0x96/0xa0
Oct  9 13:26:01 client01 kernel: [366850.440206]  [<ffffffff8100be64>] kernel_thread_helper+0x4/0x10
Oct  9 13:26:01 client01 kernel: [366850.440206]  [<ffffffff810803f0>] ? kthread+0x0/0xa0
Oct  9 13:26:01 client01 kernel: [366850.440206]  [<ffffffff8100be60>] ? kernel_thread_helper+0x0/0x10
Oct  9 13:26:01 client01 kernel: [366850.440206] Code: 49 89 70 50 19 c0 49 89 70 58 41 c6 40 4c 04 83 e0 fc 83 c0 08 41 88 40 4d c9 c3 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 8b 1e 4c 8b 46 08 4c 89 1f 4c 89 47 08 
Oct  9 13:26:01 client01 kernel: [366850.440206] RIP  [<ffffffff812c387b>] memcpy+0xb/0xb0
Oct  9 13:26:01 client01 kernel: [366850.440206]  RSP <ffff880216cdb968>
Oct  9 13:26:01 client01 kernel: [366850.508857] ---[ end trace 1e7299bcbc67da97 ]---
Oct  9 13:26:01 client01 kernel: [366850.514683] Kernel panic - not syncing: Fatal exception
Oct  9 13:26:01 client01 kernel: [366850.520220] Pid: 4, comm: kworker/0:0 Tainted: G      D     2.6.36-rc5-rbd-20014-g53f0521 #3
Oct  9 13:26:01 client01 kernel: [366850.525839] Call Trace:
Oct  9 13:26:01 client01 kernel: [366850.531249]  [<ffffffff815943f6>] panic+0x91/0x1a1
Oct  9 13:26:01 client01 kernel: [366850.536552]  [<ffffffff810610e5>] ? kmsg_dump+0x145/0x160
Oct  9 13:26:01 client01 kernel: [366850.541804]  [<ffffffff81598a8c>] oops_end+0xdc/0xf0
Oct  9 13:26:01 client01 kernel: [366850.546965]  [<ffffffff8100ed4b>] die+0x5b/0x90
Oct  9 13:26:01 client01 kernel: [366850.552091]  [<ffffffff815985c2>] do_general_protection+0x162/0x170
Oct  9 13:26:01 client01 kernel: [366850.557173]  [<ffffffff81597d25>] general_protection+0x25/0x30
Oct  9 13:26:01 client01 kernel: [366850.562249]  [<ffffffff812c387b>] ? memcpy+0xb/0xb0
Oct  9 13:26:01 client01 kernel: [366850.567245]  [<ffffffffa03d7f37>] ? ceph_pagelist_append+0x157/0x1d0 [libceph]
Oct  9 13:26:01 client01 kernel: [366850.572290]  [<ffffffffa041010a>] encode_caps_cb+0x2aa/0x3d0 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.577325]  [<ffffffffa040e73b>] iterate_session_caps+0xbb/0x1b0 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.582383]  [<ffffffff81100227>] ? __page_cache_alloc+0x87/0x90
Oct  9 13:26:01 client01 kernel: [366850.587360]  [<ffffffffa040fe60>] ? encode_caps_cb+0x0/0x3d0 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.592337]  [<ffffffffa04115bb>] send_mds_reconnect+0x2fb/0x4b0 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.597339]  [<ffffffffa0411a5d>] ceph_mdsc_handle_map+0x29d/0x450 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.602385]  [<ffffffffa03fbae9>] extra_mon_dispatch+0x29/0x30 [ceph]
Oct  9 13:26:01 client01 kernel: [366850.607380]  [<ffffffffa03d8ee0>] dispatch+0x50/0x590 [libceph]
Oct  9 13:26:01 client01 kernel: [366850.612317]  [<ffffffffa03d7621>] con_work+0x1501/0x1690 [libceph]
Oct  9 13:26:01 client01 kernel: [366850.617211]  [<ffffffff810591d1>] ? dequeue_entity+0x1b1/0x1f0
Oct  9 13:26:01 client01 kernel: [366850.622055]  [<ffffffffa03d6120>] ? con_work+0x0/0x1690 [libceph]
Oct  9 13:26:01 client01 kernel: [366850.626811]  [<ffffffff81079b73>] process_one_work+0x123/0x440
Oct  9 13:26:01 client01 kernel: [366850.631525]  [<ffffffff8107c0c0>] worker_thread+0x170/0x400
Oct  9 13:26:01 client01 kernel: [366850.636214]  [<ffffffff8107bf50>] ? worker_thread+0x0/0x400
Oct  9 13:26:01 client01 kernel: [366850.640878]  [<ffffffff81080486>] kthread+0x96/0xa0
Oct  9 13:26:01 client01 kernel: [366850.645498]  [<ffffffff8100be64>] kernel_thread_helper+0x4/0x10
Oct  9 13:26:01 client01 kernel: [366850.650123]  [<ffffffff810803f0>] ? kthread+0x0/0xa0

This client is running the master branch from a few days ago.

It seems that one mds crashed/restarted and it seems so:

root@node13:~# ps aux|grep cmds
root     10659  9.3 23.6 1079792 958204 ?      Ssl  13:01   7:28 /usr/bin/cmds -i 0 -c /etc/ceph/ceph.conf
root     11023  0.0  0.0   7672   820 pts/0    S+   14:21   0:00 grep --color=auto cmds
root@node13:~#
root@node14:~# ps aux|grep cmds
root       630  3.0  0.1  97136  8048 ?        Ssl  13:28   1:38 /usr/bin/cmds -i 1 -c /etc/ceph/ceph.conf
root       680  0.0  0.0   7672   820 pts/0    S+   14:22   0:00 grep --color=auto cmds
root@node14:~#

While it seems that mds.1 restarted after this crash, it might be involved?

It did not manually invoke the restart, could it be that the mds did this by itself?

Actions

Also available in: Atom PDF