Project

General

Profile

Actions

Bug #36299

open

Kernel panic: kernel BUG at fs/ceph/mds_client.c:1279! on CentOS 7.5.1804

Added by Dmitry Isakov over 5 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Hello! We had two kernel panics when using CEPHFS client on Centos 7.5 on different nodes. Apparently, both the panic occurred when unmounting the file system...


Files

backtrace (1.42 KB) backtrace kdump backtrace Dmitry Isakov, 10/03/2018 11:44 AM
dmesg (3.92 KB) dmesg kdump dmesg Dmitry Isakov, 10/03/2018 11:44 AM

Related issues 1 (1 open0 closed)

Related to Linux kernel client - Bug #37769: __ceph_remove_cap caused kernel crashNew

Actions
Actions #1

Updated by Dmitry Isakov over 5 years ago

Kernel version: 3.10.0-862.11.6.el7.x86_64 and 3.10.0-862.el7.x86_64
libcephfs2-12.2.8-0.el7.x86_64
ceph-common-12.2.8-0.el7.x86_64

Actions #3

Updated by Ilya Dryomov over 5 years ago

  • Category set to fs/ceph
  • Assignee set to Zheng Yan
kernel BUG at fs/ceph/mds_client.c:1279!
invalid opcode: 0000 [#1] SMP 
CPU: 3 PID: 38552 Comm: kworker/3:0 Kdump: loaded Tainted: G               ------------ T 3.10.0-862.11.6.el7.x86_64 #1
Hardware name: HP ProLiant DL380p Gen8, BIOS P70 11/14/2013
Workqueue: ceph-msgr ceph_con_workfn [libceph]
task: ffff885e37bd8fd0 ti: ffff885609eb0000 task.ti: ffff885609eb0000
RIP: 0010:[<ffffffffc09f12ed>]  [<ffffffffc09f12ed>] remove_session_caps+0x16d/0x170 [ceph]
RSP: 0018:ffff885609eb3c48  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88698c7b0d40 RCX: 0000000000000400
RDX: 000000000000001b RSI: ffff889126b48618 RDI: ffff885609eb3c08
RBP: ffff885609eb3c88 R08: ffff88a042bdd770 R09: 0000000000000001
R10: 00000000000003e2 R11: 0000000000000000 R12: ffff88698c7b0800
R13: ffff8870b32289d8 R14: ffff88698c7b0d48 R15: ffff889845322800
FS:  0000000000000000(0000) GS:ffff8871bf6c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f761ae1c808 CR3: 000000286be0e000 CR4: 00000000000607e0
Call Trace:
 [<ffffffffc09f8500>] dispatch+0x5e0/0xb90 [ceph]
 [<ffffffff987d155a>] ? kernel_recvmsg+0x3a/0x50
 [<ffffffffc0972ff4>] try_read+0x4e4/0x1210 [libceph]
 [<ffffffff98234909>] ? sched_clock+0x9/0x10
 [<ffffffff982d50d5>] ? sched_clock_cpu+0x85/0xc0
 [<ffffffff9822a59e>] ? __switch_to+0xce/0x580
 [<ffffffffc0973dd9>] ceph_con_workfn+0xb9/0x670 [libceph]
 [<ffffffff982b613f>] process_one_work+0x17f/0x440
 [<ffffffff982b71d6>] worker_thread+0x126/0x3c0
 [<ffffffff982b70b0>] ? manage_workers.isra.24+0x2a0/0x2a0
 [<ffffffff982bdf21>] kthread+0xd1/0xe0
 [<ffffffff982bde50>] ? insert_kthread_work+0x40/0x40
 [<ffffffff989255f7>] ret_from_fork_nospec_begin+0x21/0x21
 [<ffffffff982bde50>] ? insert_kthread_work+0x40/0x40
Code: 5d 41 5e 41 5f 5d c3 48 89 fa 48 c7 c6 b0 7a a0 c0 48 c7 c7 18 4c a1 c0 31 c0 e8 cf 8b b8 d7 e9 d8 fe ff ff e8 45 30 8a d7 0f 0b <0f> 0b 90 66 66 66 66 90 48 8b 07 55 48 89 e5 48 89 02 44 8b 80 
RIP  [<ffffffffc09f12ed>] remove_session_caps+0x16d/0x170 [ceph]
BUG_ON(session->s_nr_caps > 0);
BUG_ON(!list_empty(&session->s_cap_flushing));
Actions #4

Updated by dhacky du over 5 years ago

We also have this same kernel panic
before kernel BUG, we see the following message

[5478081.176868] libceph: mds0 xx.xx.xx.xx:6800 socket closed (con state OPEN)
[5478085.807602] libceph: mds0 xx.xx.xx.xx:6800 connection reset
[5478085.807632] libceph: reset on mds0
[5478085.807633] ceph: mds0 closed our session
[5478085.807634] ceph: mds0 reconnect start
[5478085.807660] ceph: ffff8803404a2a30 auth cap           (null) not mds0 ???
[5478085.812112] ceph: mds0 reconnect denied
[5478085.812177] kernel BUG at fs/ceph/mds_client.c:1230!
[5478085.813028] task: ffff880f87e2bf40 ti: ffff880f35394000 task.ti: ffff880f35394000
[5478085.813053] RIP: 0010:[<ffffffffc07401e0>]  [<ffffffffc07401e0>] remove_session_caps+0x160/0x170 [ceph]
......

kernel 3.10.0-693.5.2.el7.x86_64
ceph version 12.2.5

Actions #5

Updated by Zheng Yan over 5 years ago

  • Related to Bug #37769: __ceph_remove_cap caused kernel crash added
Actions #6

Updated by Zheng Yan over 5 years ago

this one and #37769 could be the same issue

Actions #7

Updated by Patrick Donnelly over 3 years ago

  • Assignee deleted (Zheng Yan)
Actions

Also available in: Atom PDF