Project

General

Profile

Actions

Bug #20998

closed

RHEL74 GA kernel paniced on client node running smallfile tests with 3 active MDS

Added by Barry Marson over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

While running Ben England's small file test with 2 clients and 3 active MDS servers, one of the clients went down with:

[339141.117253] BUG: unable to handle kernel NULL pointer dereference at 0000000000000530
[339141.117345] IP: [<ffffffff816abb3c>] _raw_spin_lock+0xc/0x30
[339141.117406] PGD 0
[339141.117430] Oops: 0002 [#1] SMP
[339141.117467] Modules linked in: ceph libceph dns_resolver ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_con
ntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filt
er ebtables ip6table_filter ip6_tables iptable_filter sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_he
lper cryptd pcspkr sg joydev iTCO_wdt iTCO_vendor_support ipmi_ssif dcdbas acpi_power_meter wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad mei_me mei lpc_ich
[339141.118223] shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci ixgbe libata crct10dif_pclmul cr
ct10dif_common tg3 crc32c_intel megaraid_sas i2c_core mdio dca ptp pps_core dm_mirror dm_region_hash dm_log dm_mod
[339141.118538] CPU: 14 PID: 10021 Comm: kworker/14:1 Not tainted 3.10.0-693.el7.x86_64 #1
[339141.118605] Hardware name: Dell Inc. PowerEdge R620/0KCKR5, BIOS 1.3.6 09/11/2012
[339141.118687] Workqueue: events delayed_work [ceph]
[339141.118731] task: ffff88081db36eb0 ti: ffff8808163dc000 task.ti: ffff8808163dc000
[339141.118793] RIP: 0010:[<ffffffff816abb3c>] [<ffffffff816abb3c>] _raw_spin_lock+0xc/0x30
[339141.118866] RSP: 0018:ffff8808163dfbf8 EFLAGS: 00010246
[339141.118913] RAX: 0000000000000000 RBX: ffff880258dc7f78 RCX: 0000000000000000
[339141.118973] RDX: 0000000000000001 RSI: ffff8808163dfd4c RDI: 0000000000000530
[339141.119033] RBP: ffff8808163dfc20 R08: 0000000000000000 R09: 0000000000000000
[339141.119092] R10: dfbf0dc47ab2e8f8 R11: 7fffffffffffffff R12: ffff8808163dfd4c
[339141.119152] R13: 0000000000000000 R14: ffff88080cf126f0 R15: ffff880258dc7f80
[339141.119212] FS: 0000000000000000(0000) GS:ffff88081fbc0000(0000) knlGS:0000000000000000
[339141.119280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[339141.119329] CR2: 0000000000000530 CR3: 0000000816e0e000 CR4: 00000000000407e0
[339141.119390] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[339141.119450] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[339141.119508] Stack:
[339141.119529] ffffffffc0672c15 0000000000000000 ffff880258dc7f78 ffff8808163dfd4c
[339141.119600] 0000000000000000 ffff8808163dfc60 ffffffffc067416c ffff88080cf12a30
[339141.119670] 0000000000000000 ffff88101e92e990 ffff88080cf126f0 ffff88080cf126f0
[339141.119741] Call Trace:
[339141.119779] [<ffffffffc0672c15>] ? __cap_is_valid+0x25/0xb0 [ceph]
[339141.119845] [<ffffffffc067416c>] __ceph_caps_issued+0x5c/0xe0 [ceph]
[339141.119911] [<ffffffffc067620f>] ceph_check_caps+0x12f/0xba0 [ceph]
[339141.119977] [<ffffffffc067a3d6>] ceph_check_delayed_caps+0x86/0xf0 [ceph]
[339141.120047] [<ffffffffc0681705>] delayed_work+0x35/0x260 [ceph]
[339141.120103] [<ffffffff810a881a>] process_one_work+0x17a/0x440
[339141.120156] [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
[339141.120207] [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
[339141.120264] [<ffffffff810b098f>] kthread+0xcf/0xe0
[339141.120309] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[339141.122548] [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
[339141.124773] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[339141.126999] Code: 5d c3 0f 1f 44 00 00 85 d2 74 e4 0f 1f 40 00 eb ed 66 0f 1f 44 00 00 b8 01 00 00 00 5d c3 90 66 66 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 a4 2a ff ff 5d
[339141.131737] RIP [<ffffffff816abb3c>] _raw_spin_lock+0xc/0x30
[339141.134001] RSP <ffff8808163dfbf8>
[339141.136236] CR2: 0000000000000530

I do have a vmcore file. I just need to know if some one wants it and where to place it.

We are running luminous ceph-*-12.1.2-0.el7 bits

Barry

Actions

Also available in: Atom PDF