Project

General

Profile

Actions

Bug #45563

open

__list_add_valid kernel NULL pointer in _ceph_remove_cap

Added by joe h almost 4 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
fs/ceph
Target version:
% Done:

50%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
05/15/2020
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

【描述】
recently,I encountered the same bug(unable to handle kernel NULL pointer dereference in __list_add_valid) for many times while cluster process running deleting data business for 12 hours;
After analysis and debug for a long time, there is no reason and wrong code was found. Because I had a similar problem before so that we suppose the session is closed or rejected, session is destoried, however the session is still in use at the __ceph_remove_cap.

my cluster kernel version is 4.14.0, the backtrace information are as follows;

【backtrace】
[145278.236178] BUG: unable to handle kernel NULL pointer dereference at (null)
[145278.236947] IP: _list_add_valid+0x10/0x80
[145278.237669] PGD 0 P4D 0
[145278.238340] Oops: 0000 [#1] SMP
[145278.238996] Modules linked in: rpcsec_gss_krb5(OE) iptable_filter tcp_diag inet_diag rpcrdma(OE) nfsd(OE) auth_rpcgss(OE) nfs_acl(OE) lockd(OE) grace(OE) fscache sunrpc(OE) ceph(OE) libceph(OE) dns_resolver dev_pmc_scsi(OE) flashcache(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) devlink mlx_compat(OE) ip_vs nf_conntrack sr_mod vfat fat cdrom dm_mirror dm_region_hash dm_log dm_mod intel_rapl x86_pkg_temp_thermal ext4 intel_powerclamp mbcache jbd2 coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel uas ses crypto_simd enclosure usb_storage glue_helper cryptd sg intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore
[145278.244064] intel_rapl_perf pcspkr joydev ioatdma mei_me i2c_i801 mei lpc_ich shpchp ipmi_si wmi ipmi_devintf ipmi_msghandler nfit acpi_power_meter libnvdimm acpi_pad ip_tables xfs libcrc32c sd_mod ast drm_kms_helper syscopyarea sysfillrect crc32c_intel sysimgblt fb_sys_fops ixgbe ttm igb ahci mdio smartpqi drm libahci ptp scsi_transport_sas i2c_algo_bit dca libata pps_core i2c_core [last unloaded: mlxfw]
[145278.247449] CPU: 32 PID: 291 Comm: kswapd1 Kdump: loaded Tainted: G W OEL ------------ 4.14.0-xxx #1
[145278.249190] task: ffff9ff33beb1e80 task.stack: ffffb2cf0eb44000
[145278.250057] RIP: 0010:
_list_add_valid+0x10/0x80
[145278.250910] RSP: 0018:ffffb2cf0eb47a90 EFLAGS: 00010246
[145278.251758] RAX: ffffa00379383560 RBX: ffffa0036b2cd860 RCX: ffffa0036b2cd978
[145278.252625] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa0036b2cd888
[145278.253497] RBP: ffffb2cf0eb47a90 R08: 0000000000000000 R09: ffffa00379383560
[145278.254381] R10: ffffffffffffff9c R11: ffffffffffffff83 R12: ffff9fd7c5092798
[145278.255275] R13: ffffa00379383000 R14: 0000000000000000 R15: ffffa0036b2cd888
[145278.256182] FS: 0000000000000000(0000) GS:ffffa0037d700000(0000) knlGS:0000000000000000
[145278.257094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[145278.257990] CR2: 0000000000000000 CR3: 0000002379a09006 CR4: 00000000007606e0
[145278.258881] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[145278.259754] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[145278.260632] PKRU: 55555554
[145278.261430] Call Trace:
[145278.262238] __ceph_remove_cap+0xe6/0x250 [ceph]
[145278.263048] ceph_queue_caps_release+0x50/0x70 [ceph]
[145278.263864] ceph_destroy_inode+0x2d/0x1c0 [ceph]
[145278.264686] destroy_inode+0x3b/0x60
[145278.265506] evict+0x142/0x1a0
[145278.266331] iput+0x17d/0x1d0
[145278.267149] dentry_unlink_inode+0xb9/0xf0
[145278.267953] __dentry_kill+0xc7/0x170
[145278.268742] shrink_dentry_list+0x122/0x280
[145278.269515] prune_dcache_sb+0x5a/0x80
[145278.270275] super_cache_scan+0x107/0x190
[145278.271027] shrink_slab+0x26b/0x480
[145278.271769] shrink_node+0x2f7/0x310
[145278.272510] kswapd+0x2cf/0x730
[145278.273257] kthread+0x109/0x140
[145278.274008] ? mem_cgroup_shrink_node+0x180/0x180
[145278.274753] ? kthread_park+0x60/0x60
[145278.275487] ret_from_fork+0x2a/0x40
[145278.276220] Code: b8 f4 ff ff ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff ff ff 90 90 90 90 90 90 90 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 <48> 8b 32 48 39 f0 75 42 48 39 c7 74 23 48 39 fa 74 1e b8 01 00
[145278.277774] RIP: __list_add_valid+0x10/0x80 RSP: ffffb2cf0eb47a90
[145278.278518] CR2: 0000000000000000

【相同问题】
on the www.tracker.ceph.com, there is a similar question which link is: tracker.ceph.com/issues/37769,and Zheng Yan gave the patch which can explains this Oops(commit: 0a07fc8cd01b6838d999a5eacaa99fe90b8f768b)
The main thing is my code has been modified according to this commit

Actions #1

Updated by joe h almost 4 years ago

while reviewing code,I think there is something wrong with the code as follows:
one cap is stored in two data structures when excute ceph_add_cap, which are cap rbtree and cap list in session; and when excute __ceph_remove_cap, first remove a cap from session list,then remove the cap from cap rbtree;
however, when excute __unregister_session,there is not any check or handle section of cap which ever belongs to its session;
so, if session has been unregistered, but those caps which belong to the session doesn't know the session has been unregistered, when excute __ceph_remove_cap, because there is not any check to judge the session is or not exist, Will a kernel panic(kernel NULL pointer dereference) happened ???

Actions #2

Updated by Zheng Yan almost 4 years ago

joe h wrote:

while reviewing code,I think there is something wrong with the code as follows:
one cap is stored in two data structures when excute ceph_add_cap, which are cap rbtree and cap list in session; and when excute __ceph_remove_cap, first remove a cap from session list,then remove the cap from cap rbtree;
however, when excute __unregister_session,there is not any check or handle section of cap which ever belongs to its session;
so, if session has been unregistered, but those caps which belong to the session doesn't know the session has been unregistered, when excute __ceph_remove_cap, because there is not any check to judge the session is or not exist, Will a kernel panic(kernel NULL pointer dereference) happened ???

remove_session_caps() is always called for __unregister_session() case

Actions #3

Updated by joe h almost 4 years ago

thanks Zheng Yan. and I have another question, have you done this test(deleting the file When the memory is full)? Will a kernel panic(kernel NULL pointer dereference or soft lockup stuck for 22s!) happened ???

Actions #4

Updated by Jeff Layton over 3 years ago

I don't think there is much we can do with a kernel this old. Can you reproduce it on something newer?

Actions #5

Updated by joe h over 3 years ago

Jeff Layton wrote:

I don't think there is much we can do with a kernel this old. Can you reproduce it on something newer?

not yet recently.

Actions #6

Updated by Jeff Layton over 2 years ago

It looks like this probably fell down in the list_add_tail call in __ceph_queue_cap_release, so most likely the session_caps list_head in there was bogus. The locking looks fine AFAICT, we generally have the correct spinlock when changing non-private lists.

One interesting thing: A couple of places that remove caps from the lists do it via list_del, and others use list_del_init. Is it possible for the refcount on those caps to go high again and have them double-removed from the list? If so, that might corrupt it.

Some of that depends on the cap object lifecycle, and it's not completely clear to me. It may be best to go ahead and turn those into list_del_init() calls just to ensure that that doesn't happen.

Actions

Also available in: Atom PDF