Project

General

Profile

Actions

Bug #37769

open

__ceph_remove_cap caused kernel crash

Added by geng jichao over 5 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
fs/ceph
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

In the past year, I encountered the same bug many times, the kernel crash, my kernel verison is 4.14.0.

The backtrace is as follows, after several days of debug, no reason was found. I suppose when session is closed or rejected,session is destoried,but this session is still in use at the __ceph_remove_cap.

Backtrace1:

[587113.629482] libceph: reset on mds9

[587113.629483] ceph: mds9 closed our session

[587113.629484] ceph: mds9 reconnect start

[587113.638602] ceph: mds9 reconnect denied

[587113.643652] libceph: mds9 192.168.18.105:6800 socket closed (con state NEGOTIATING)

[587113.644432] libceph: mds7 192.168.18.106:6800 socket closed (con state NEGOTIATING)

[587113.703887] ceph: mds3 reconnect denied

[587114.374099] ceph: mds2 reconnect denied

[587114.384801] ceph: mds5 reconnect denied

[587114.661227] ceph: mds7 rejected session

[587114.670391] libceph: mds7 192.168.18.106:6800 socket closed (con state NEGOTIATING)

[587114.706454] ceph: mds9 rejected session

[587114.728571] libceph: mds9 192.168.18.105:6800 socket closed (con state NEGOTIATING)

[587131.714914] list_del corruption. next->prev should be ffff88d59736bfa0, but was (null)

[587131.714929] ------------[ cut here ]------------

[587131.714940] WARNING: CPU: 29 PID: 3040112 at lib/list_debug.c:56 __list_del_entry_valid+0x6c/0xa0

[587131.714941] Modules linked in: sch_netem binfmt_misc iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_vs nf_conntrack tcp_diag inet_diag iptable_filter ceph libceph dns_resolver nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc flashcache(OE) bonding dm_mirror dm_region_hash dm_log dm_mod vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 lpc_ich hpilo hpwdt ioatdma sg ipmi_si ipmi_devintf ipmi_msghandler shpchp acpi_power_meter wmi pcc_cpufreq ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea crc32c_intel sysfillrect sysimgblt fb_sys_fops

[587131.714985] ttm drm ixgbe igb mdio i2c_algo_bit ptp hpsa pps_core i2c_core dca scsi_transport_sas

[587131.714995] CPU: 29 PID: 3040112 Comm: exportfs Kdump: loaded Tainted: G W OE ------------ 4.14.0-49.el7.centos.x86_64 #1

[587131.714996] Hardware name: HP ProLiant XL450 Gen9 Server/ProLiant XL450 Gen9 Server, BIOS U21 01/22/2018

[587131.714998] task: ffff8897701ac5c0 task.stack: ffff98c8cd8fc000

[587131.715001] RIP: 0010:__list_del_entry_valid+0x6c/0xa0

[587131.715002] RSP: 0018:ffff98c8cd8ff998 EFLAGS: 00010246

[587131.715004] RAX: 0000000000000054 RBX: ffff88d59736bf78 RCX: 0000000000000000

[587131.715005] RDX: 0000000000000000 RSI: ffff88977fcce078 RDI: ffff88977fcce078

[587131.715006] RBP: ffff98c8cd8ff998 R08: 0000000000000000 R09: 00000000000402ac

[587131.715006] R10: 0000000000000004 R11: 00000000000402ab R12: ffff885c7c1ac2c0

[587131.715007] R13: ffff8898f904d800 R14: 0000000000000001 R15: ffff88d59736bfa0

[587131.715009] FS: 00007fb0b20ab740(0000) GS:ffff88977fcc0000(0000) knlGS:0000000000000000

[587131.715010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[587131.715011] CR2: 00007f7a3cb5d000 CR3: 000000034c326004 CR4: 00000000003606e0

[587131.715013] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[587131.715014] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[587131.715015] Call Trace:

[587131.715036] __ceph_remove_cap+0x6e/0x250 [ceph]

[587131.715042] ceph_queue_caps_release+0x50/0x70 [ceph]

[587131.715046] ceph_destroy_inode+0x2d/0x1c0 [ceph]

[587131.715050] destroy_inode+0x3b/0x60

[587131.715052] evict+0x142/0x1a0

[587131.715053] iput+0x17d/0x1d0

[587131.715056] dentry_unlink_inode+0xb9/0xf0

[587131.715057] __dentry_kill+0xc7/0x170

[587131.715059] shrink_dentry_list+0x122/0x280

[587131.715061] d_invalidate+0x66/0x110

[587131.715062] lookup_fast+0x26b/0x2e0

[587131.715065] ? __inode_permission+0x48/0xd0

[587131.715066] walk_component+0x49/0x250

[587131.715068] path_lookupat+0x79/0x210

[587131.715070] ? mntput+0x24/0x40

[587131.715071] filename_lookup+0xaf/0x190

[587131.715078] ? __ceph_do_getattr+0x6c/0x1d0 [ceph]

[587131.715081] ? kmem_cache_alloc+0x9c/0x1b0

[587131.715083] ? getname_flags+0x4f/0x1f0

[587131.715085] user_path_at_empty+0x36/0x40

[587131.715087] vfs_statx+0x77/0xe0

[587131.715089] SYSC_newlstat+0x72/0x90

[587131.715093] ? __audit_syscall_entry+0xaf/0x100

[587131.715097] ? syscall_trace_enter+0x1d0/0x2b0

[587131.715099] ? __audit_syscall_exit+0x209/0x290

[587131.715101] SyS_newlstat+0xe/0x10

[587131.715103] do_syscall_64+0x67/0x1b0

[587131.715107] entry_SYSCALL64_slow_path+0x25/0x25

[587131.715109] RIP: 0033:0x7fb0b19af635

[587131.715110] RSP: 002b:00007ffc6b779368 EFLAGS: 00000246 ORIG_RAX: 0000000000000006

[587131.715112] RAX: ffffffffffffffda RBX: 00007ffc6b779670 RCX: 00007fb0b19af635

[587131.715113] RDX: 00007ffc6b7793a0 RSI: 00007ffc6b7793a0 RDI: 00007ffc6b779670

[587131.715114] RBP: 00007ffc6b779460 R08: 00007fb0b1c87060 R09: 00007ffc6b778d82

[587131.715115] R10: 00007ffc6b778ee0 R11: 0000000000000246 R12: 00007ffc6b77a670

[587131.715115] R13: 00007ffc6b779683 R14: 0000000000626898 R15: 000000000062689b

[587131.715117] Code: 48 89 c2 48 89 fe 31 c0 48 c7 c7 00 d3 e9 95 e8 2e f1 d3 ff 0f ff 31 c0 5d c3 48 89 fe 31 c0 48 c7 c7 b0 d3 e9 95 e8 17 f1 d3 ff <0f> ff 31 c0 5d c3 48 89 fe 31 c0 48 c7 c7 70 d3 e9 95 e8 00 f1

[587131.715138] ---[ end trace 4daa860bbe4650a4 ]---

[587131.715147] BUG: unable to handle kernel NULL pointer dereference at (null)

[587131.716387] IP: __list_add_valid+0x10/0x80

[587131.717350] PGD 3f1db3e067 P4D 3f1db3e067 PUD 3f5ccef067 PMD 0

[587131.718297] Oops: 0000 [#1] SMP

[587131.719331] Modules linked in: sch_netem binfmt_misc iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_vs nf_conntrack tcp_diag inet_diag iptable_filter ceph libceph dns_resolver nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc flashcache(OE) bonding dm_mirror dm_region_hash dm_log dm_mod vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 lpc_ich hpilo hpwdt ioatdma sg ipmi_si ipmi_devintf ipmi_msghandler shpchp acpi_power_meter wmi pcc_cpufreq ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea crc32c_intel sysfillrect sysimgblt fb_sys_fops

[587131.726117] ttm drm ixgbe igb mdio i2c_algo_bit ptp hpsa pps_core i2c_core dca scsi_transport_sas

[587131.727190] CPU: 29 PID: 3040112 Comm: exportfs Kdump: loaded Tainted: G W OE ------------ 4.14.0-49.el7.centos.x86_64 #1

[587131.729195] Hardware name: HP ProLiant XL450 Gen9 Server/ProLiant XL450 Gen9 Server, BIOS U21 01/22/2018

[587131.730436] task: ffff8897701ac5c0 task.stack: ffff98c8cd8fc000

[587131.731520] RIP: 0010:__list_add_valid+0x10/0x80

[587131.732590] RSP: 0018:ffff98c8cd8ff998 EFLAGS: 00010246

[587131.733743] RAX: ffff8898f904dd40 RBX: ffff88d59736bf78 RCX: 0000000000000000

[587131.734939] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88d59736bfa0

[587131.736067] RBP: ffff98c8cd8ff998 R08: 0000000000000000 R09: ffff8898f904dd40

[587131.737146] R10: 0000000000000004 R11: 00000000000402ab R12: ffff885c7c1ac2c0

[587131.738165] R13: ffff8898f904d800 R14: 0000000000000000 R15: ffff88d59736bfa0

[587131.739168] FS: 00007fb0b20ab740(0000) GS:ffff88977fcc0000(0000) knlGS:0000000000000000

[587131.740108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[587131.741176] CR2: 0000000000000000 CR3: 000000034c326004 CR4: 00000000003606e0

[587131.742197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[587131.743167] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[587131.744185] Call Trace:

[587131.745387] __ceph_remove_cap+0xe6/0x250 [ceph]

[587131.746411] ceph_queue_caps_release+0x50/0x70 [ceph]

[587131.747359] ceph_destroy_inode+0x2d/0x1c0 [ceph]

[587131.748362] destroy_inode+0x3b/0x60

[587131.749683] evict+0x142/0x1a0

[587131.751057] iput+0x17d/0x1d0

[587131.752398] dentry_unlink_inode+0xb9/0xf0

[587131.753731] __dentry_kill+0xc7/0x170

[587131.755054] shrink_dentry_list+0x122/0x280

[587131.756407] d_invalidate+0x66/0x110

[587131.757723] lookup_fast+0x26b/0x2e0

[587131.759092] ? __inode_permission+0x48/0xd0

[587131.760462] walk_component+0x49/0x250

[587131.761757] path_lookupat+0x79/0x210

[587131.762931] ? mntput+0x24/0x40

[587131.763910] filename_lookup+0xaf/0x190

[587131.764919] ? __ceph_do_getattr+0x6c/0x1d0 [ceph]

[587131.765992] ? kmem_cache_alloc+0x9c/0x1b0

[587131.767000] ? getname_flags+0x4f/0x1f0

[587131.767954] user_path_at_empty+0x36/0x40

[587131.768857] vfs_statx+0x77/0xe0

[587131.769816] SYSC_newlstat+0x72/0x90

[587131.770716] ? __audit_syscall_entry+0xaf/0x100

[587131.771654] ? syscall_trace_enter+0x1d0/0x2b0

[587131.772518] ? __audit_syscall_exit+0x209/0x290

[587131.773343] SyS_newlstat+0xe/0x10

[587131.774188] do_syscall_64+0x67/0x1b0

[587131.774974] entry_SYSCALL64_slow_path+0x25/0x25

[587131.775953] RIP: 0033:0x7fb0b19af635

[587131.776711] RSP: 002b:00007ffc6b779368 EFLAGS: 00000246 ORIG_RAX: 0000000000000006

[587131.777533] RAX: ffffffffffffffda RBX: 00007ffc6b779670 RCX: 00007fb0b19af635

[587131.778268] RDX: 00007ffc6b7793a0 RSI: 00007ffc6b7793a0 RDI: 00007ffc6b779670

[587131.778988] RBP: 00007ffc6b779460 R08: 00007fb0b1c87060 R09: 00007ffc6b778d82

[587131.779712] R10: 00007ffc6b778ee0 R11: 0000000000000246 R12: 00007ffc6b77a670

[587131.780443] R13: 00007ffc6b779683 R14: 0000000000626898 R15: 000000000062689b

[587131.781177] Code: b8 f4 ff ff ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff ff ff 90 90 90 90 90 90 90 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 <48> 8b 32 48 39 f0 75 42 48 39 c7 74 23 48 39 fa 74 1e b8 01 00

[587131.782732] RIP: __list_add_valid+0x10/0x80 RSP: ffff98c8cd8ff998

[587131.783513] CR2: 0000000000000000

Backtrace2:

[68807.855974] libceph: reset on mds1

[68807.855976] ceph: mds1 closed our session

[68807.855977] ceph: mds1 reconnect start

[68808.005046] ceph: mds1 reconnect denied

[68812.848738] libceph: mon2 192.168.18.104:6789 session lost, hunting for new mon

[68812.849612] libceph: mon5 192.168.18.107:6789 session established

[68843.055776] libceph: mon5 192.168.18.107:6789 session lost, hunting for new mon

[68843.056629] libceph: mon3 192.168.18.105:6789 session established

[68873.777804] libceph: mon3 192.168.18.105:6789 session lost, hunting for new mon

[68873.778650] libceph: mon1 192.168.18.103:6789 session established

[68903.983867] libceph: mon1 192.168.18.103:6789 session lost, hunting for new mon

[68903.990689] libceph: mon0 192.168.18.102:6789 session established

[68907.686612] ceph: mds0 rejected session

[68907.686914] ceph: get_quota_statfs err=-13

[68907.687012] ceph: get_quota_statfs err=-13

[68907.687032] ceph: get_quota_statfs err=-13

[68907.687265] BUG: unable to handle kernel paging request at ffffffffbd28be40

[68907.687276] IP: native_queued_spin_lock_slowpath+0x10a/0x1a0

[68907.687277] PGD 6ad180c067 P4D 6ad180c067 PUD 6ad180d063 PMD 6ad0e001e1

[68907.687280] Oops: 0003 [#1] SMP

[68907.687281] Modules linked in: rpcsec_gss_krb5 tcp_diag inet_diag iptable_filter ceph libceph dns_resolver nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ip_vs nf_conntrack flashcache(OE) bonding dm_mirror dm_region_hash dm_log dm_mod vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf iTCO_wdt iTCO_vendor_support pcspkr joydev sg hpilo hpwdt lpc_ich ioatdma i2c_i801 acpi_power_meter shpchp pcc_cpufreq wmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables xfs libcrc32c sd_mod mgag200 crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe drm igb mdio ptp hpsa i2c_algo_bit pps_core i2c_core

[68907.687320] dca scsi_transport_sas

[68907.687324] CPU: 2 PID: 41999 Comm: nfsd Kdump: loaded Tainted: G W OE ------------ 4.14.0-49.el7.centos.x86_64 #1

[68907.687324] Hardware name: HP ProLiant XL450 Gen9 Server/ProLiant XL450 Gen9 Server, BIOS U21 01/22/2018

[68907.687326] task: ffff941fb8c30000 task.stack: ffffa04aa66f0000

[68907.687328] RIP: 0010:native_queued_spin_lock_slowpath+0x10a/0x1a0//IP指针

[68907.687329] RSP: 0018:ffffa04aa66f3a38 EFLAGS: 00010282//可以响应可屏蔽中断

[68907.687330] RAX: 0000000000003ffe RBX: ffff9425048ea708 RCX: 00000000000c0000

[68907.687330] RDX: ffffffffbd28be40 RSI: 00000000ffffff00 RDI: ffff93ddb8d55d10

[68907.687331] RBP: ffffa04aa66f3a38 R08: ffff93e5ff89c740 R09: 0000000000000000

[68907.687332] R10: ffff93dc8351c468 R11: 0000000000000000 R12: ffff93dc68322160

[68907.687332] R13: ffff93ddb8d55800 R14: 0000000000000001 R15: ffff93ddcc4c29c0

[68907.687333] FS: 0000000000000000(0000) GS:ffff93e5ff880000(0000) knlGS:0000000000000000

[68907.687334] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[68907.687334] CR2: ffffffffbd28be40 CR3: 0000006ad1809002 CR4: 00000000003606e0

[68907.687335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[68907.687335] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[68907.687336] Call Trace:

[68907.687340] queued_spin_lock_slowpath+0xb/0x13

[68907.687345] _raw_spin_lock+0x20/0x30

[68907.687360] __ceph_remove_cap+0x52/0x250 [ceph]

[68907.687366] ceph_queue_caps_release+0x50/0x70 [ceph]

[68907.687370] ceph_destroy_inode+0x2d/0x1c0 [ceph]

[68907.687373] destroy_inode+0x3b/0x60

[68907.687374] evict+0x142/0x1a0

[68907.687375] iput+0x17d/0x1d0

[68907.687377] dentry_unlink_inode+0xb9/0xf0

[68907.687378] __dentry_kill+0xc7/0x170

[68907.687380] shrink_dentry_list+0x122/0x280

[68907.687382] d_invalidate+0x66/0x110

[68907.687384] lookup_dcache+0x57/0x70

[68907.687385] __lookup_hash+0x20/0xa0

[68907.687386] lookup_one_len+0xef/0x130

[68907.687394] nfsd_lookup_dentry+0x123/0x420 [nfsd]

[68907.687399] nfsd_lookup+0x82/0x120 [nfsd]

[68907.687404] nfsd4_lookup+0x1a/0x20 [nfsd]

[68907.687411] nfsd4_proc_compound+0x3e0/0x810 [nfsd]

[68907.687416] nfsd_dispatch+0xc9/0x2f0 [nfsd]

[68907.687430] svc_process_common+0x385/0x710 [sunrpc]

[68907.687436] svc_process+0xfd/0x1c0 [sunrpc]

[68907.687440] nfsd+0xf3/0x190 [nfsd]

[68907.687442] kthread+0x109/0x140

[68907.687446] ? nfsd_destroy+0x60/0x60 [nfsd]

[68907.687447] ? kthread_park+0x60/0x60

[68907.687449] ret_from_fork+0x25/0x30

[68907.687450] Code: c1 e0 10 45 31 c9 85 c0 74 46 48 89 c2 c1 e8 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 c7 01 00 48 03 14 c5 20 b4 b0 bd <4c> 89 02 41 8b 40 08 85 c0 75 0a f3 90 41 8b 40 08 85 c0 74 f6

[68907.687469] RIP: native_queued_spin_lock_slowpath+0x10a/0x1a0 RSP: ffffa04aa66f3a38

[68907.687470] CR2: ffffffffbd28be40

Backtrace3:

[586315.216695] nfsd: REMOVE client: 172.16.18.169, port=1007 36: 01070001 00000002 00000100 94923470 9250dec5 9d420d99 /data/nfs/import/host_10/thread_1/mass_1/day50/vdb_control.file

[586317.730533] libceph: mon2 192.168.18.104:6789 session lost, hunting for new mon

[586317.733962] libceph: mon4 192.168.18.106:6789 session established

[586334.739638] nfsd: REMOVE client: 172.16.18.169, port=796 36: 01070001 00000002 00000100 94923470 9250dec5 9d420d99 /data/nfs/import/host_12/thread_3/mass_1/day50/vdb_control.file

[586347.938680] libceph: mon4 192.168.18.106:6789 session lost, hunting for new mon

[586347.940391] libceph: mon6 192.168.18.108:6789 session established

[586356.266132] list_del corruption. next->prev should be ffff9396f1abaf28, but was (null)

[586356.266144] ------------[ cut here ]------------

[586356.266153] WARNING: CPU: 14 PID: 319 at lib/list_debug.c:56 __list_del_entry_valid+0x6c/0xa0

[586356.266154] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_vs nf_conntrack tcp_diag inet_diag iptable_filter ceph libceph dns_resolver nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc flashcache(OE) bonding dm_mirror dm_region_hash dm_log dm_mod intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul vfat fat ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 lpc_ich sg hpwdt hpilo ioatdma acpi_power_meter wmi pcc_cpufreq shpchp ipmi_si ipmi_devintf ipmi_msghandler ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe igb

[586356.266206] mdio i2c_algo_bit ptp hpsa pps_core i2c_core dca scsi_transport_sas

[586356.266215] CPU: 14 PID: 319 Comm: kswapd1 Kdump: loaded Tainted: G W OE ------------ 4.14.0-49.el7.centos.x86_64 #1

[586356.266217] Hardware name: HP ProLiant XL450 Gen9 Server/ProLiant XL450 Gen9 Server, BIOS U21 01/22/2018

[586356.266218] task: ffff93587d1b9740 task.stack: ffffa6eccd9f0000

[586356.266221] RIP: 0010:__list_del_entry_valid+0x6c/0xa0

[586356.266222] RSP: 0018:ffffa6eccd9f3a90 EFLAGS: 00010246

[586356.266224] RAX: 0000000000000054 RBX: ffff9396f1abaf00 RCX: 0000000000000000

[586356.266225] RDX: 0000000000000000 RSI: ffff93987f10e078 RDI: ffff93987f10e078

[586356.266226] RBP: ffffa6eccd9f3a90 R08: 0000000000000000 R09: 00000000000171ff

[586356.266227] R10: 0000000000000004 R11: 00000000000171fe R12: ffff931d8f479bd0

[586356.266228] R13: ffff93586e2f6000 R14: 0000000000000001 R15: ffff9396f1abaf28

[586356.266230] FS: 0000000000000000(0000) GS:ffff93987f100000(0000) knlGS:0000000000000000

[586356.266231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[586356.266232] CR2: 00007faf1b632170 CR3: 0000001b52209005 CR4: 00000000003606e0

[586356.266233] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[586356.266234] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[586356.266235] Call Trace:

[586356.266266] __ceph_remove_cap+0x6e/0x250 [ceph]

[586356.266278] ceph_queue_caps_release+0x50/0x70 [ceph]

[586356.266286] ceph_destroy_inode+0x2d/0x1c0 [ceph]

[586356.266291] destroy_inode+0x3b/0x60

[586356.266293] evict+0x142/0x1a0

[586356.266294] iput+0x17d/0x1d0

[586356.266297] dentry_unlink_inode+0xb9/0xf0

[586356.266299] __dentry_kill+0xc7/0x170

[586356.266302] shrink_dentry_list+0x122/0x280

[586356.266304] prune_dcache_sb+0x5a/0x80

[586356.266307] super_cache_scan+0x107/0x190

[586356.266312] shrink_slab+0x26b/0x480

[586356.266315] shrink_node+0x2f7/0x310

[586356.266317] kswapd+0x2cf/0x730

[586356.266324] kthread+0x109/0x140

[586356.266327] ? mem_cgroup_shrink_node+0x180/0x180

[586356.266328] ? kthread_park+0x60/0x60

[586356.266333] ret_from_fork+0x25/0x30

[586356.266334] Code: 48 89 c2 48 89 fe 31 c0 48 c7 c7 00 d3 c9 86 e8 2e f1 d3 ff 0f ff 31 c0 5d c3 48 89 fe 31 c0 48 c7 c7 b0 d3 c9 86 e8 17 f1 d3 ff <0f> ff 31 c0 5d c3 48 89 fe 31 c0 48 c7 c7 70 d3 c9 86 e8 00 f1

[586356.266366] ---[ end trace 42fd686294455ca6 ]---

[586356.266374] BUG: unable to handle kernel NULL pointer dereference at (null)

[586356.268676] IP: __list_add_valid+0x10/0x80

[586356.270683] PGD 0 P4D 0

[586356.272776] Oops: 0000 [#1] SMP

[586356.274918] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_vs nf_conntrack tcp_diag inet_diag iptable_filter ceph libceph dns_resolver nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc flashcache(OE) bonding dm_mirror dm_region_hash dm_log dm_mod intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul vfat fat ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 lpc_ich sg hpwdt hpilo ioatdma acpi_power_meter wmi pcc_cpufreq shpchp ipmi_si ipmi_devintf ipmi_msghandler ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe igb

[586356.288962] mdio i2c_algo_bit ptp hpsa pps_core i2c_core dca scsi_transport_sas

[586356.290933] CPU: 14 PID: 319 Comm: kswapd1 Kdump: loaded Tainted: G W OE ------------ 4.14.0-49.el7.centos.x86_64 #1

[586356.294595] Hardware name: HP ProLiant XL450 Gen9 Server/ProLiant XL450 Gen9 Server, BIOS U21 01/22/2018

[586356.296694] task: ffff93587d1b9740 task.stack: ffffa6eccd9f0000

[586356.298605] RIP: 0010:__list_add_valid+0x10/0x80

[586356.300697] RSP: 0018:ffffa6eccd9f3a90 EFLAGS: 00010246

[586356.302370] RAX: ffff93586e2f6540 RBX: ffff9396f1abaf00 RCX: 0000000000000000

[586356.303737] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9396f1abaf28

[586356.305068] RBP: ffffa6eccd9f3a90 R08: 0000000000000000 R09: ffff93586e2f6540

[586356.306385] R10: 0000000000000004 R11: 00000000000171fe R12: ffff931d8f479bd0

[586356.307829] R13: ffff93586e2f6000 R14: 0000000000000000 R15: ffff9396f1abaf28

[586356.309101] FS: 0000000000000000(0000) GS:ffff93987f100000(0000) knlGS:0000000000000000

[586356.311153] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[586356.313562] CR2: 0000000000000000 CR3: 0000001b52209005 CR4: 00000000003606e0

[586356.316123] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[586356.318505] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[586356.320877] Call Trace:

[586356.323136] __ceph_remove_cap+0xe6/0x250 [ceph]

[586356.324393] ceph_queue_caps_release+0x50/0x70 [ceph]

[586356.325527] ceph_destroy_inode+0x2d/0x1c0 [ceph]

[586356.326633] destroy_inode+0x3b/0x60

[586356.327754] evict+0x142/0x1a0

[586356.329143] iput+0x17d/0x1d0

[586356.331218] dentry_unlink_inode+0xb9/0xf0

[586356.333253] __dentry_kill+0xc7/0x170

[586356.335177] shrink_dentry_list+0x122/0x280

[586356.336722] prune_dcache_sb+0x5a/0x80

[586356.337732] super_cache_scan+0x107/0x190

[586356.338673] shrink_slab+0x26b/0x480

[586356.339594] shrink_node+0x2f7/0x310

[586356.340513] kswapd+0x2cf/0x730

[586356.341377] kthread+0x109/0x140

[586356.342287] ? mem_cgroup_shrink_node+0x180/0x180

[586356.343108] ? kthread_park+0x60/0x60

[586356.343877] ret_from_fork+0x25/0x30

[586356.344626] Code: b8 f4 ff ff ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff ff ff 90 90 90 90 90 90 90 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 <48> 8b 32 48 39 f0 75 42 48 39 c7 74 23 48 39 fa 74 1e b8 01 00

[586356.346208] RIP: __list_add_valid+0x10/0x80 RSP: ffffa6eccd9f3a90

[586356.346988] CR2: 0000000000000000

Backtrace4:

[30634.512779] PGD 0

[30634.512909] Oops: 0002 [#1] SMP

[30634.513116] Modules linked in: tcp_diag(E) inet_diag(E) iptable_filter(E) ip_tables(E) x_tables(E) ceph(E) libceph(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) fscache(E) sunrpc(E) ip_vs(E) xfs(E) nf_conntrack(E) libcrc32c(E) ib_iser(OE) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) ipmi_ssif(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) ipmi_si(E) glue_helper(E) ablk_helper(E) ipmi_msghandler(E) cryptd(E) sb_edac(E) ioatdma(E) hpilo(E) lpc_ich(E) 8250_fintek(E) edac_core(E) shpchp(E) mac_hid(E) acpi_power_meter(E) wmi(E) bonding(E) lp(E) parport(E) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs(E) ib_ipoib(OE)

[30634.517793] ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) ib_netlink(OE) mlx4_en(OE) mlx4_core(OE) mlx_compat(OE) nls_iso8859_1(E) btrfs(E) xor(E) raid6_pq(E) igb(E) ixgbe(E) i2c_algo_bit(E) vxlan(E) ip6_udp_tunnel(E) dca(E) udp_tunnel(E) ptp(E) hpsa(E) pps_core(E) mdio(E) scsi_transport_sas(E) fjes(E)

[30634.519821] CPU: 20 PID: 1912340 Comm: kworker/20:2 Tainted: G OE 4.4.0-46-generic #67

[30634.520404] Hardware name: HP ProLiant XL450 Gen9 Server/ProLiant XL450 Gen9 Server, BIOS U21 01/22/2018

[30634.521004] Workqueue: ceph-msgr ceph_con_workfn [libceph]

[30634.521340] task: ffff881d629944c0 ti: ffff881c40da8000 task.ti: ffff881c40da8000

[30634.521805] RIP: 0010:[<ffffffffc093352c>] [<ffffffffc093352c>] __ceph_remove_cap+0xdc/0x210 [ceph]

[30634.522386] RSP: 0018:ffff881c40daba18 EFLAGS: 00010246

[30634.522738] RAX: 0000000000000000 RBX: ffff881c7c759078 RCX: ffff881c7c7590a0

[30634.523172] RDX: 0000000000000000 RSI: ffff883d022ebe00 RDI: ffff883d022ebdd0

[30634.538168] RBP: ffff881c40daba50 R08: 0000000000000000 R09: 00000001802a0015

[30634.553528] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8820c14b5800

[30634.568744] R13: ffff883d022eb800 R14: ffff883d022ebdd0 R15: 0000000000000000

[30634.583876] FS: 0000000000000000(0000) GS:ffff881fffd00000(0000) knlGS:0000000000000000

[30634.599049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[30634.614257] CR2: 0000000000000000 CR3: 0000000002e0a000 CR4: 00000000003406e0

[30634.629281] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[30634.644401] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[30634.659635] Stack:

[30634.674654] ffff881c42d47800 0000000100000000 0000000000000000 ffff881c7c759078

[30634.690433] ffff8820c14b5810 ffff8820c14b5800 ffffffff81f36a80 ffff881c40daba78

[30634.706376] ffffffffc09336b0 ffff8820c14b5b48 ffff8820c14b5bd0 ffff8820c14b5b48

[30634.722249] Call Trace:

[30634.738087] [<ffffffffc09336b0>] ceph_queue_caps_release+0x50/0x70 [ceph]

[30634.755052] [<ffffffffc091ea69>] ceph_destroy_inode+0x39/0x1b0 [ceph]

[30634.772022] [<ffffffff81218fc8>] destroy_inode+0x38/0x60

[30634.788385] [<ffffffff8121911d>] evict+0x12d/0x190

[30634.805154] [<ffffffff81219391>] iput+0x1c1/0x240

[30634.822031] [<ffffffff81214cad>] __dentry_kill+0x18d/0x1e0

[30634.839153] [<ffffffff81214e30>] dput+0x130/0x220

[30634.856331] [<ffffffffc093f7d4>] ceph_mdsc_release_request+0xc4/0x160 [ceph]

[30634.873463] [<ffffffffc093f96a>] __unregister_request+0xfa/0x220 [ceph]

[30634.891156] [<ffffffffc093fee6>] __do_request+0xd6/0x460 [ceph]

[30634.908762] [<ffffffffc09402c5>] __wake_requests+0x55/0xb0 [ceph]

[30634.926987] [<ffffffffc0943957>] ceph_mdsc_handle_mdsmap+0x4c7/0x700 [ceph]

[30634.945183] [<ffffffffc08d2fe0>] ? read_partial.isra.25+0x50/0x80 [libceph]

[30634.963432] [<ffffffffc091c940>] extra_mon_dispatch+0x40/0x60 [ceph]

[30634.981208] [<ffffffffc08d9fd9>] dispatch+0x369/0x720 [libceph]

[30634.998921] [<ffffffffc08d5df0>] ceph_con_workfn+0x700/0x1920 [libceph]

[30635.016957] [<ffffffff810b0e8d>] ? dequeue_task_fair+0x51d/0x8b0

[30635.034846] [<ffffffff810b2935>] ? put_prev_entity+0x35/0x7d0

[30635.052399] [<ffffffff8102b63d>] ? __switch_to+0x1cd/0x590

[30635.069263] [<ffffffff81094335>] process_one_work+0x165/0x480

[30635.085594] [<ffffffff8109469b>] worker_thread+0x4b/0x4c0

[30635.100901] [<ffffffff81094650>] ? process_one_work+0x480/0x480

[30635.116083] [<ffffffff81094650>] ? process_one_work+0x480/0x480

[30635.130932] [<ffffffff8109a5f9>] kthread+0xc9/0xe0

[30635.145121] [<ffffffff8109a530>] ? kthread_create_on_node+0x1c0/0x1c0

[30635.159340] [<ffffffff817ff10f>] ret_from_fork+0x3f/0x70

[30635.172561] [<ffffffff8109a530>] ? kthread_create_on_node+0x1c0/0x1c0

[30635.185679] Code: 48 01 00 00 00 74 44 49 8b 95 08 06 00 00 48 8d 4b 28 49 8d b5 00 06 00 00 45 31 ff 49 89 8d 08 06 00 00 48 89 73 28 48 89 53 30 <48> 89 0a 41 83 85 f0 05 00 00 01 eb 13 41 8b 85 c4 05 00 00 39

[30635.212987] RIP [<ffffffffc093352c>] __ceph_remove_cap+0xdc/0x210 [ceph]

[30635.226441] RSP <ffff881c40daba18>

[30635.239511] CR2: 0000000000000000


Related issues 1 (1 open0 closed)

Related to Linux kernel client - Bug #36299: Kernel panic: kernel BUG at fs/ceph/mds_client.c:1279! on CentOS 7.5.1804New

Actions
Actions #1

Updated by Zheng Yan over 5 years ago

following patch can explain this oops

commit 0a07fc8cd01b6838d999a5eacaa99fe90b8f768b
Author: Yan, Zheng <zyan@redhat.com>
Date:   Wed Mar 29 15:30:24 2017 +0800

    ceph: fix potential use-after-free

    __unregister_session() free the session if it drops the last
    reference. We should grab an extra reference if we want to use
    session after __unregister_session().

    Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
    Reviewed-by: Jeff Layton <jlayton@redhat.com>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Actions #2

Updated by Zheng Yan over 5 years ago

  • Related to Bug #36299: Kernel panic: kernel BUG at fs/ceph/mds_client.c:1279! on CentOS 7.5.1804 added
Actions #3

Updated by Jeff Layton about 5 years ago

  • Assignee set to Zheng Yan
Actions #4

Updated by Patrick Donnelly over 3 years ago

  • Assignee deleted (Zheng Yan)
Actions

Also available in: Atom PDF