Actions
Bug #36299
openKernel panic: kernel BUG at fs/ceph/mds_client.c:1279! on CentOS 7.5.1804
Status:
New
Priority:
Normal
Assignee:
-
Category:
fs/ceph
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
Hello! We had two kernel panics when using CEPHFS client on Centos 7.5 on different nodes. Apparently, both the panic occurred when unmounting the file system...
Files
Updated by Dmitry Isakov over 5 years ago
Kernel version: 3.10.0-862.11.6.el7.x86_64 and 3.10.0-862.el7.x86_64
libcephfs2-12.2.8-0.el7.x86_64
ceph-common-12.2.8-0.el7.x86_64
Updated by Dmitry Isakov over 5 years ago
CentOS bug https://bugs.centos.org/view.php?id=15350
Updated by Ilya Dryomov over 5 years ago
- Category set to fs/ceph
- Assignee set to Zheng Yan
kernel BUG at fs/ceph/mds_client.c:1279! invalid opcode: 0000 [#1] SMP CPU: 3 PID: 38552 Comm: kworker/3:0 Kdump: loaded Tainted: G ------------ T 3.10.0-862.11.6.el7.x86_64 #1 Hardware name: HP ProLiant DL380p Gen8, BIOS P70 11/14/2013 Workqueue: ceph-msgr ceph_con_workfn [libceph] task: ffff885e37bd8fd0 ti: ffff885609eb0000 task.ti: ffff885609eb0000 RIP: 0010:[<ffffffffc09f12ed>] [<ffffffffc09f12ed>] remove_session_caps+0x16d/0x170 [ceph] RSP: 0018:ffff885609eb3c48 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88698c7b0d40 RCX: 0000000000000400 RDX: 000000000000001b RSI: ffff889126b48618 RDI: ffff885609eb3c08 RBP: ffff885609eb3c88 R08: ffff88a042bdd770 R09: 0000000000000001 R10: 00000000000003e2 R11: 0000000000000000 R12: ffff88698c7b0800 R13: ffff8870b32289d8 R14: ffff88698c7b0d48 R15: ffff889845322800 FS: 0000000000000000(0000) GS:ffff8871bf6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f761ae1c808 CR3: 000000286be0e000 CR4: 00000000000607e0 Call Trace: [<ffffffffc09f8500>] dispatch+0x5e0/0xb90 [ceph] [<ffffffff987d155a>] ? kernel_recvmsg+0x3a/0x50 [<ffffffffc0972ff4>] try_read+0x4e4/0x1210 [libceph] [<ffffffff98234909>] ? sched_clock+0x9/0x10 [<ffffffff982d50d5>] ? sched_clock_cpu+0x85/0xc0 [<ffffffff9822a59e>] ? __switch_to+0xce/0x580 [<ffffffffc0973dd9>] ceph_con_workfn+0xb9/0x670 [libceph] [<ffffffff982b613f>] process_one_work+0x17f/0x440 [<ffffffff982b71d6>] worker_thread+0x126/0x3c0 [<ffffffff982b70b0>] ? manage_workers.isra.24+0x2a0/0x2a0 [<ffffffff982bdf21>] kthread+0xd1/0xe0 [<ffffffff982bde50>] ? insert_kthread_work+0x40/0x40 [<ffffffff989255f7>] ret_from_fork_nospec_begin+0x21/0x21 [<ffffffff982bde50>] ? insert_kthread_work+0x40/0x40 Code: 5d 41 5e 41 5f 5d c3 48 89 fa 48 c7 c6 b0 7a a0 c0 48 c7 c7 18 4c a1 c0 31 c0 e8 cf 8b b8 d7 e9 d8 fe ff ff e8 45 30 8a d7 0f 0b <0f> 0b 90 66 66 66 66 90 48 8b 07 55 48 89 e5 48 89 02 44 8b 80 RIP [<ffffffffc09f12ed>] remove_session_caps+0x16d/0x170 [ceph]
BUG_ON(session->s_nr_caps > 0); BUG_ON(!list_empty(&session->s_cap_flushing));
Updated by dhacky du over 5 years ago
We also have this same kernel panic
before kernel BUG, we see the following message
[5478081.176868] libceph: mds0 xx.xx.xx.xx:6800 socket closed (con state OPEN) [5478085.807602] libceph: mds0 xx.xx.xx.xx:6800 connection reset [5478085.807632] libceph: reset on mds0 [5478085.807633] ceph: mds0 closed our session [5478085.807634] ceph: mds0 reconnect start [5478085.807660] ceph: ffff8803404a2a30 auth cap (null) not mds0 ??? [5478085.812112] ceph: mds0 reconnect denied [5478085.812177] kernel BUG at fs/ceph/mds_client.c:1230! [5478085.813028] task: ffff880f87e2bf40 ti: ffff880f35394000 task.ti: ffff880f35394000 [5478085.813053] RIP: 0010:[<ffffffffc07401e0>] [<ffffffffc07401e0>] remove_session_caps+0x160/0x170 [ceph] ......
kernel 3.10.0-693.5.2.el7.x86_64
ceph version 12.2.5
Updated by Zheng Yan over 5 years ago
- Related to Bug #37769: __ceph_remove_cap caused kernel crash added
Updated by Zheng Yan over 5 years ago
this one and #37769 could be the same issue
Actions