Project

General

Profile

Bug #36299

Kernel panic: kernel BUG at fs/ceph/mds_client.c:1279! on CentOS 7.5.1804

Added by Dmitry Isakov about 1 year ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
fs/ceph
Target version:
-
Start date:
10/03/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature:

Description

Hello! We had two kernel panics when using CEPHFS client on Centos 7.5 on different nodes. Apparently, both the panic occurred when unmounting the file system...

backtrace - kdump backtrace (1.42 KB) Dmitry Isakov, 10/03/2018 11:44 AM

dmesg - kdump dmesg (3.92 KB) Dmitry Isakov, 10/03/2018 11:44 AM


Related issues

Related to Linux kernel client - Bug #37769: __ceph_remove_cap caused kernel crash New 12/28/2018

History

#1 Updated by Dmitry Isakov about 1 year ago

Kernel version: 3.10.0-862.11.6.el7.x86_64 and 3.10.0-862.el7.x86_64
libcephfs2-12.2.8-0.el7.x86_64
ceph-common-12.2.8-0.el7.x86_64

#3 Updated by Ilya Dryomov about 1 year ago

  • Category set to fs/ceph
  • Assignee set to Zheng Yan
kernel BUG at fs/ceph/mds_client.c:1279!
invalid opcode: 0000 [#1] SMP 
CPU: 3 PID: 38552 Comm: kworker/3:0 Kdump: loaded Tainted: G               ------------ T 3.10.0-862.11.6.el7.x86_64 #1
Hardware name: HP ProLiant DL380p Gen8, BIOS P70 11/14/2013
Workqueue: ceph-msgr ceph_con_workfn [libceph]
task: ffff885e37bd8fd0 ti: ffff885609eb0000 task.ti: ffff885609eb0000
RIP: 0010:[<ffffffffc09f12ed>]  [<ffffffffc09f12ed>] remove_session_caps+0x16d/0x170 [ceph]
RSP: 0018:ffff885609eb3c48  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88698c7b0d40 RCX: 0000000000000400
RDX: 000000000000001b RSI: ffff889126b48618 RDI: ffff885609eb3c08
RBP: ffff885609eb3c88 R08: ffff88a042bdd770 R09: 0000000000000001
R10: 00000000000003e2 R11: 0000000000000000 R12: ffff88698c7b0800
R13: ffff8870b32289d8 R14: ffff88698c7b0d48 R15: ffff889845322800
FS:  0000000000000000(0000) GS:ffff8871bf6c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f761ae1c808 CR3: 000000286be0e000 CR4: 00000000000607e0
Call Trace:
 [<ffffffffc09f8500>] dispatch+0x5e0/0xb90 [ceph]
 [<ffffffff987d155a>] ? kernel_recvmsg+0x3a/0x50
 [<ffffffffc0972ff4>] try_read+0x4e4/0x1210 [libceph]
 [<ffffffff98234909>] ? sched_clock+0x9/0x10
 [<ffffffff982d50d5>] ? sched_clock_cpu+0x85/0xc0
 [<ffffffff9822a59e>] ? __switch_to+0xce/0x580
 [<ffffffffc0973dd9>] ceph_con_workfn+0xb9/0x670 [libceph]
 [<ffffffff982b613f>] process_one_work+0x17f/0x440
 [<ffffffff982b71d6>] worker_thread+0x126/0x3c0
 [<ffffffff982b70b0>] ? manage_workers.isra.24+0x2a0/0x2a0
 [<ffffffff982bdf21>] kthread+0xd1/0xe0
 [<ffffffff982bde50>] ? insert_kthread_work+0x40/0x40
 [<ffffffff989255f7>] ret_from_fork_nospec_begin+0x21/0x21
 [<ffffffff982bde50>] ? insert_kthread_work+0x40/0x40
Code: 5d 41 5e 41 5f 5d c3 48 89 fa 48 c7 c6 b0 7a a0 c0 48 c7 c7 18 4c a1 c0 31 c0 e8 cf 8b b8 d7 e9 d8 fe ff ff e8 45 30 8a d7 0f 0b <0f> 0b 90 66 66 66 66 90 48 8b 07 55 48 89 e5 48 89 02 44 8b 80 
RIP  [<ffffffffc09f12ed>] remove_session_caps+0x16d/0x170 [ceph]
BUG_ON(session->s_nr_caps > 0);
BUG_ON(!list_empty(&session->s_cap_flushing));

#4 Updated by dhacky du 11 months ago

We also have this same kernel panic
before kernel BUG, we see the following message

[5478081.176868] libceph: mds0 xx.xx.xx.xx:6800 socket closed (con state OPEN)
[5478085.807602] libceph: mds0 xx.xx.xx.xx:6800 connection reset
[5478085.807632] libceph: reset on mds0
[5478085.807633] ceph: mds0 closed our session
[5478085.807634] ceph: mds0 reconnect start
[5478085.807660] ceph: ffff8803404a2a30 auth cap           (null) not mds0 ???
[5478085.812112] ceph: mds0 reconnect denied
[5478085.812177] kernel BUG at fs/ceph/mds_client.c:1230!
[5478085.813028] task: ffff880f87e2bf40 ti: ffff880f35394000 task.ti: ffff880f35394000
[5478085.813053] RIP: 0010:[<ffffffffc07401e0>]  [<ffffffffc07401e0>] remove_session_caps+0x160/0x170 [ceph]
......

kernel 3.10.0-693.5.2.el7.x86_64
ceph version 12.2.5

#5 Updated by Zheng Yan 9 months ago

  • Related to Bug #37769: __ceph_remove_cap caused kernel crash added

#6 Updated by Zheng Yan 9 months ago

this one and #37769 could be the same issue

Also available in: Atom PDF