Project

General

Profile

Actions

Bug #18041

closed

periodically kernel crashes with CephFS

Added by Donatas Abraitis over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Crash signature (v1):
Crash signature (v2):

Description

Hi,

last few days we are having ~5-15 kernel crashes according to:

Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: ------------[ cut here ]------------
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: kernel BUG at fs/ceph/inode.c:1272!
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: invalid opcode: 0000 [#1] SMP
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: Modules linked in: veth ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables tun rpcsec_gss_krb5 nfsv4 ceph dns_resolver l
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc32c_intel ast drm_kms_helper syscopyarea sysfillrect sysim
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: CPU: 3 PID: 41021 Comm: kworker/3:2 Not tainted 4.8.4-1.el7.elrepo.x86_64 #1
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: Hardware name: Supermicro SYS-2028TR-H72R/X10DRT-H, BIOS 2.0a 03/08/2016
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: Workqueue: ceph-msgr ceph_con_workfn [libceph]
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: task: ffff881e33dada00 task.stack: ffff881a1100c000
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: RIP: 0010:[<ffffffffa07cb729>]  [<ffffffffa07cb729>] ceph_fill_trace+0x889/0x890 [ceph]
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: RSP: 0018:ffff881a1100fb78  EFLAGS: 00010283
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: RAX: ffff883b4816b8c0 RBX: 0000000000000000 RCX: 0000000100220002
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: RDX: ffff881fc0ee6800 RSI: ffffea00734e7500 RDI: ffff883fef953a10
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: RBP: ffff881a1100fbe8 R08: ffff881cd39d4ca8 R09: 0000000100220002
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: R10: 00000000d39d4b01 R11: ffff881cd39d4ca8 R12: ffff881c7ed95128
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: R13: ffff8817ad2f8348 R14: ffff88171ecbbb00 R15: ffff881fe80e2c00
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: FS:  0000000000000000(0000) GS:ffff881fff8c0000(0000) knlGS:0000000000000000
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: CR2: 00007fd19f407000 CR3: 0000000001c06000 CR4: 00000000003406e0
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: Stack:
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  ffff881fe80e2f68 ffff881a1100fc48 ffff881fc0ee6a69 ffff883ff11df100
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  ffff883ff13b9000 ffff883ff0712000 ffff881a1100fba8 ffff881a1100fba8
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  00000000a21363b4 ffff883fef953800 ffff8818a9f83500 ffff881fe80e2c90
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: Call Trace:
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffffa07eb2af>] handle_reply+0x46f/0xbb0 [ceph]
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffffa07ed4e8>] dispatch+0xd8/0xa60 [ceph]
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffffa072ee3e>] try_read+0x9be/0x11a0 [libceph]
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffff810b936d>] ? set_next_entity+0x4d/0x7c0
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffff810ba865>] ? put_prev_entity+0x35/0x380
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffff810b8b25>] ? pick_next_entity+0xa5/0x160
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffffa072f6ca>] ceph_con_workfn+0xaa/0x5d0 [libceph]
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffff8109ad42>] process_one_work+0x152/0x400
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffff8109b635>] worker_thread+0x125/0x4b0
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffff8109b510>] ? rescuer_thread+0x380/0x380
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffff810a1128>] kthread+0xd8/0xf0
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffff8173ad3f>] ret_from_fork+0x1f/0x40
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  [<ffffffff810a1050>] ? kthread_park+0x60/0x60
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: Code: 00 00 48 c7 c7 ed f7 7f a0 c6 05 d3 71 04 00 01 e8 7d 61 8b e0 49 8b 97 60 01 00 00 31 db 0f b6 42 0e e9 3b f8 ff ff 0f 0b 0f 0b <0f> 0b 0f
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: RIP  [<ffffffffa07cb729>] ceph_fill_trace+0x889/0x890 [ceph]
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel:  RSP <ffff881a1100fb78>
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: ---[ end trace 6c925a3b3290a20f ]---
Nov 25 12:14:01 us-imm-node1a.000webhost.io kernel: Kernel panic - not syncing: Fatal exception
-- Reboot --

Kernel version is: 4.8.4-1.el7.elrepo.x86_64

/etc/ceph/ceph.conf (client side):

[root@us-imm-node1a ~]# cat /etc/ceph/ceph.conf
[global]
fsid = fc43f491-9693-48d3-91be-d5bb2b7a085e
mon initial members = v6.us-imm-cephmon1.000webhost.io,v6.us-imm-cephmon2.000webhost.io,v6.us-imm-cephmon3.000webhost.io
mon host = v6.us-imm-cephmon1.000webhost.io,v6.us-imm-cephmon2.000webhost.io,v6.us-imm-cephmon3.000webhost.io
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
ms bind ipv6 = true

[osd]
osd crush update on start = false

Any comments?

If you miss something more here, do not hesitate!


Files

ceph-mds.v6.us-imm-cephmon1.log (9.35 KB) ceph-mds.v6.us-imm-cephmon1.log Aurimas Lapiene, 11/25/2016 02:28 PM
ceph-mds.v6.us-imm-cephmon2.log (12 KB) ceph-mds.v6.us-imm-cephmon2.log Aurimas Lapiene, 11/25/2016 02:28 PM
throughput.png (70.6 KB) throughput.png Aurimas Lapiene, 11/28/2016 09:02 AM
iops.png (71.5 KB) iops.png Aurimas Lapiene, 11/28/2016 09:02 AM
node1b_dmesg.log (330 KB) node1b_dmesg.log Aurimas Lapiene, 11/29/2016 01:03 PM
d_reval.patch (530 Bytes) d_reval.patch Zheng Yan, 11/30/2016 01:20 PM
0001-ceph-don-t-set-req-r_locked_dir-in-ceph_d_revalidate.patch (1.95 KB) 0001-ceph-don-t-set-req-r_locked_dir-in-ceph_d_revalidate.patch updated patch Jeff Layton, 11/30/2016 09:05 PM
Actions

Also available in: Atom PDF