Bug #36299
Kernel panic: kernel BUG at fs/ceph/mds_client.c:1279! on CentOS 7.5.1804
Status:
New
Priority:
Normal
Assignee:
-
Category:
fs/ceph
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature:
Description
Hello! We had two kernel panics when using CEPHFS client on Centos 7.5 on different nodes. Apparently, both the panic occurred when unmounting the file system...
Related issues
History
#1 Updated by Dmitry Isakov over 2 years ago
Kernel version: 3.10.0-862.11.6.el7.x86_64 and 3.10.0-862.el7.x86_64
libcephfs2-12.2.8-0.el7.x86_64
ceph-common-12.2.8-0.el7.x86_64
#2 Updated by Dmitry Isakov over 2 years ago
CentOS bug https://bugs.centos.org/view.php?id=15350
#3 Updated by Ilya Dryomov over 2 years ago
- Category set to fs/ceph
- Assignee set to Zheng Yan
kernel BUG at fs/ceph/mds_client.c:1279! invalid opcode: 0000 [#1] SMP CPU: 3 PID: 38552 Comm: kworker/3:0 Kdump: loaded Tainted: G ------------ T 3.10.0-862.11.6.el7.x86_64 #1 Hardware name: HP ProLiant DL380p Gen8, BIOS P70 11/14/2013 Workqueue: ceph-msgr ceph_con_workfn [libceph] task: ffff885e37bd8fd0 ti: ffff885609eb0000 task.ti: ffff885609eb0000 RIP: 0010:[<ffffffffc09f12ed>] [<ffffffffc09f12ed>] remove_session_caps+0x16d/0x170 [ceph] RSP: 0018:ffff885609eb3c48 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88698c7b0d40 RCX: 0000000000000400 RDX: 000000000000001b RSI: ffff889126b48618 RDI: ffff885609eb3c08 RBP: ffff885609eb3c88 R08: ffff88a042bdd770 R09: 0000000000000001 R10: 00000000000003e2 R11: 0000000000000000 R12: ffff88698c7b0800 R13: ffff8870b32289d8 R14: ffff88698c7b0d48 R15: ffff889845322800 FS: 0000000000000000(0000) GS:ffff8871bf6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f761ae1c808 CR3: 000000286be0e000 CR4: 00000000000607e0 Call Trace: [<ffffffffc09f8500>] dispatch+0x5e0/0xb90 [ceph] [<ffffffff987d155a>] ? kernel_recvmsg+0x3a/0x50 [<ffffffffc0972ff4>] try_read+0x4e4/0x1210 [libceph] [<ffffffff98234909>] ? sched_clock+0x9/0x10 [<ffffffff982d50d5>] ? sched_clock_cpu+0x85/0xc0 [<ffffffff9822a59e>] ? __switch_to+0xce/0x580 [<ffffffffc0973dd9>] ceph_con_workfn+0xb9/0x670 [libceph] [<ffffffff982b613f>] process_one_work+0x17f/0x440 [<ffffffff982b71d6>] worker_thread+0x126/0x3c0 [<ffffffff982b70b0>] ? manage_workers.isra.24+0x2a0/0x2a0 [<ffffffff982bdf21>] kthread+0xd1/0xe0 [<ffffffff982bde50>] ? insert_kthread_work+0x40/0x40 [<ffffffff989255f7>] ret_from_fork_nospec_begin+0x21/0x21 [<ffffffff982bde50>] ? insert_kthread_work+0x40/0x40 Code: 5d 41 5e 41 5f 5d c3 48 89 fa 48 c7 c6 b0 7a a0 c0 48 c7 c7 18 4c a1 c0 31 c0 e8 cf 8b b8 d7 e9 d8 fe ff ff e8 45 30 8a d7 0f 0b <0f> 0b 90 66 66 66 66 90 48 8b 07 55 48 89 e5 48 89 02 44 8b 80 RIP [<ffffffffc09f12ed>] remove_session_caps+0x16d/0x170 [ceph]
BUG_ON(session->s_nr_caps > 0); BUG_ON(!list_empty(&session->s_cap_flushing));
#4 Updated by dhacky du about 2 years ago
We also have this same kernel panic
before kernel BUG, we see the following message
[5478081.176868] libceph: mds0 xx.xx.xx.xx:6800 socket closed (con state OPEN) [5478085.807602] libceph: mds0 xx.xx.xx.xx:6800 connection reset [5478085.807632] libceph: reset on mds0 [5478085.807633] ceph: mds0 closed our session [5478085.807634] ceph: mds0 reconnect start [5478085.807660] ceph: ffff8803404a2a30 auth cap (null) not mds0 ??? [5478085.812112] ceph: mds0 reconnect denied [5478085.812177] kernel BUG at fs/ceph/mds_client.c:1230! [5478085.813028] task: ffff880f87e2bf40 ti: ffff880f35394000 task.ti: ffff880f35394000 [5478085.813053] RIP: 0010:[<ffffffffc07401e0>] [<ffffffffc07401e0>] remove_session_caps+0x160/0x170 [ceph] ......
kernel 3.10.0-693.5.2.el7.x86_64
ceph version 12.2.5
#5 Updated by Zheng Yan about 2 years ago
- Related to Bug #37769: __ceph_remove_cap caused kernel crash added
#6 Updated by Zheng Yan about 2 years ago
this one and #37769 could be the same issue
#7 Updated by Patrick Donnelly about 2 months ago
- Assignee deleted (
Zheng Yan)