Project

General

Profile

Actions

Bug #14232

closed

Kernel NULL pointer dereference in __dcache_readdir

Added by Markus Blank-Burian over 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I found this bug using the kernel client on linux v4.1.15 with all ceph/libceph patches applied up to v4.4-rc8.

[Mon Jan  4 09:14:08 2016] Key type ceph registered
[Mon Jan  4 09:14:08 2016] libceph: loaded (mon/osd proto 15/24)
[Mon Jan  4 09:14:08 2016] ceph: loaded (mds proto 32)
[Mon Jan  4 09:14:08 2016] libceph: client63344014 fsid 7900aaa3-1a32-4c6f-84fb-2ee08089198f
[Mon Jan  4 09:14:08 2016] libceph: mon0 192.168.1.253:6789 session established
[Mon Jan  4 13:04:40 2016] libceph: osd83 down
[Mon Jan  4 13:05:11 2016] libceph: osd83 up
[Mon Jan  4 13:08:05 2016] libceph: osd83 down
[Mon Jan  4 13:08:54 2016] libceph: osd83 up
[Mon Jan  4 13:10:47 2016] libceph: get_reply osd83 tid 10952 unknown, skipping
[Mon Jan  4 13:11:28 2016] libceph: get_reply osd83 tid 10978 unknown, skipping
[Mon Jan  4 14:34:44 2016] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
[Mon Jan  4 14:34:44 2016] IP: [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph]
[Mon Jan  4 14:34:44 2016] PGD 80a619067 PUD 80a53f067 PMD 0 
[Mon Jan  4 14:34:44 2016] Oops: 0000 [#1] SMP 
[Mon Jan  4 14:34:44 2016] Modules linked in: cbc ceph libceph ipmi_watchdog w83627ehf adm1026 w83795 w83793 hwmon_vid jc42 8021q garp mrp stp llc autofs4 cpufreq_ondemand xfs ipmi_si ipmi_devintf ipmi_msghandler mgag200 syscopyarea sysfillrect sysimgblt ttm drm_kms_helper kvm_amd drm kvm amd64_edac_mod microcode psmouse evdev pcspkr sp5100_tco edac_mce_amd rtc_cmos i2c_piix4 k10temp edac_core button acpi_cpufreq processor rpcsec_gss_krb5 fuse nfsv4 nfs af_packet sr_mod cdrom hid_generic usbhid hid bonding usb_storage sd_mod ohci_pci ohci_hcd ehci_pci ehci_hcd ata_generic ahci usbcore pata_atiixp libahci libata usb_common ipv6 dm_mirror dm_region_hash dm_log dm_mod unix
[Mon Jan  4 14:34:44 2016] CPU: 5 PID: 28145 Comm: python3 Tainted: P        W  O    4.1.15+ #16
[Mon Jan  4 14:34:44 2016] Hardware name: Supermicro H8DGU/H8DGU, BIOS 1.0b       09/02/10  
[Mon Jan  4 14:34:44 2016] task: ffff88080d821890 ti: ffff8806fabe8000 task.ti: ffff8806fabe8000
[Mon Jan  4 14:34:44 2016] RIP: 0010:[<ffffffffa0205761>]  [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph]
[Mon Jan  4 14:34:44 2016] RSP: 0018:ffff8806fabebde8  EFLAGS: 00010246
[Mon Jan  4 14:34:44 2016] RAX: 000000000000002f RBX: ffff880721b6d120 RCX: 0000000000000001
[Mon Jan  4 14:34:44 2016] RDX: 0000000000000000 RSI: 0000000100020002 RDI: ffff8807243dfe58
[Mon Jan  4 14:34:44 2016] RBP: ffff8806fabebe98 R08: 0000000004300430 R09: 0000000000000004
[Mon Jan  4 14:34:44 2016] R10: ffff8806fabebd60 R11: 0000000000000000 R12: ffff880713e32348
[Mon Jan  4 14:34:44 2016] R13: ffff88034bac7ec0 R14: ffff8807243dfe00 R15: ffff880713e32348
[Mon Jan  4 14:34:44 2016] FS:  00007f59bc5b67c0(0000) GS:ffff88080fc40000(0000) knlGS:0000000000000000
[Mon Jan  4 14:34:44 2016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Jan  4 14:34:44 2016] CR2: 000000000000000c CR3: 000000080c0f0000 CR4: 00000000000006e0
[Mon Jan  4 14:34:44 2016] Stack:
[Mon Jan  4 14:34:44 2016]  ffff8807218fa500 ffff8807243dfe58 ffff880713e32020 0000000000000000
[Mon Jan  4 14:34:44 2016]  0000002ffabebe48 ffff880721b6d120 ffff880c09c7e800 ffff8800a5104a80
[Mon Jan  4 14:34:44 2016]  0000000000000040 ffff88080d821890 ffff88080a785700 ffff8806fabebef8
[Mon Jan  4 14:34:44 2016] Call Trace:
[Mon Jan  4 14:34:44 2016]  [<ffffffff81112697>] iterate_dir+0x74/0xfb
[Mon Jan  4 14:34:45 2016]  [<ffffffff81112809>] SyS_getdents+0x78/0xc4
[Mon Jan  4 14:34:45 2016]  [<ffffffff81112462>] ? fillonedir+0xb6/0xb6
[Mon Jan  4 14:34:45 2016]  [<ffffffff81409917>] system_call_fastpath+0x12/0x6a
[Mon Jan  4 14:34:45 2016] Code: 46 58 4d 8b 5e 78 48 89 c7 48 89 85 58 ff ff ff 4c 89 9d 68 ff ff ff e8 7f 3e 20 e1 4c 8b 9d 68 ff ff ff 31 d2 8b 85 74 ff ff ff <41> 3b 43 0c 75 72 49 8b 46 30 48 85 c0 74 69 48 83 b8 d0 fc ff 
[Mon Jan  4 14:34:45 2016] RIP  [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph]
[Mon Jan  4 14:34:45 2016]  RSP <ffff8806fabebde8>
[Mon Jan  4 14:34:45 2016] CR2: 000000000000000c
[Mon Jan  4 14:34:45 2016] ---[ end trace 8869dff4d5641722 ]---
[Mon Jan  4 16:43:37 2016] libceph: osd85 down

Using gdb, the location checked out to be "0x0000000000005785 <+1086>"

   0x0000000000005758 <+1041>:  lea    0x58(%r14),%rax
   0x000000000000575c <+1045>:  mov    0x78(%r14),%r11
   0x0000000000005760 <+1049>:  mov    %rax,%rdi
   0x0000000000005763 <+1052>:  mov    %rax,-0xa8(%rbp)
   0x000000000000576a <+1059>:  mov    %r11,-0x98(%rbp)
   0x0000000000005771 <+1066>:  callq  0x5776 <ceph_readdir+1071>
   0x0000000000005776 <+1071>:  mov    -0x98(%rbp),%r11
   0x000000000000577d <+1078>:  xor    %edx,%edx
   0x000000000000577f <+1080>:  mov    -0x8c(%rbp),%eax
   0x0000000000005785 <+1086>:  cmp    0xc(%r11),%eax
   0x0000000000005789 <+1090>:  jne    0x57fd <ceph_readdir+1206>
   0x000000000000578b <+1092>:  mov    0x30(%r14),%rax
   0x000000000000578f <+1096>:  test   %rax,%rax
   0x0000000000005792 <+1099>:  je     0x57fd <ceph_readdir+1206>
   0x0000000000005794 <+1101>:  cmpq   $0xffffffffffffffff,-0x330(%rax)
   0x000000000000579c <+1109>:  je     0x57fd <ceph_readdir+1206>
   0x000000000000579e <+1111>:  cmpq   $0x2,-0x338(%rax)
   0x00000000000057a6 <+1119>:  je     0x57fd <ceph_readdir+1206>

which corresponds to

0x5785 is in ceph_readdir (fs/ceph/dir.c:207).
202                             break;
203
204                     emit_dentry = false;
205                     di = ceph_dentry(dentry);
206                     spin_lock(&dentry->d_lock);
207                     if (di->lease_shared_gen == shared_gen &&
208                         d_really_is_positive(dentry) &&
209                         ceph_snap(d_inode(dentry)) != CEPH_SNAPDIR &&
210                         ceph_ino(d_inode(dentry)) != CEPH_INO_CEPH &&
211                         fpos_cmp(ctx->pos, di->offset) <= 0) {

So this means di==NULL after the spin_lock. Fyi, my patchset includes "ceph: rework dcache readdir", which might be related to this issue. As a workaround, I will now mount with noasyncreaddir.


Files

readdir-cache.patch (2.3 KB) readdir-cache.patch Zheng Yan, 02/26/2016 08:08 AM
Actions #1

Updated by Zheng Yan over 8 years ago

Sorry for the delay, please upload your patches (which are applied to 4.1.15 kernel). If you can, please take 4.4 kernel a try.

Actions #2

Updated by Markus Blank-Burian over 8 years ago

The patch list has grown rather long. If there have been any changes/fixes in dcache, I don't have them included. Within the next 1-2 weeks, I can try out 4.4, since its now released. There was a bug in the NUMA migration code which stopped us from using 4.3.

fc0561cefc04e7803c0f6501ca4f310a502f65b8 xfs: optimise away log forces on timestamp updates for fdatasync
583d0fef756a7615e50f0f68ea0892a497d03971 libceph: clear msg->con in ceph_msg_release() only
a51983e4dd2d4d63912aab939f657c4cd476e21a libceph: add nocephx_sign_messages option
859bff51dc5e92ddfb5eb6f17b8040d9311095bb libceph: stop duplicating client fields in messenger
4199b8eec36405822619d4176bddfacf7b47eb44 libceph: drop authorizer check from cephx msg signing routines
79dbd1baa651cece408e68a1b445f3628c4b5bdc libceph: msg signing callouts don't need con argument
8a703a383dd3458753e0ad71860ed3a5097692b3 libceph: evaluate osd_req_op_data() arguments only once
68cd5b4b7612c2956d8553dfb39490b29f32566d ceph: make fsync() wait unsafe requests that created/modified inode
4c06ace81a60636dec358c288ef6aaf3aa6dc599 ceph: add request to i_unsafe_dirops when getting unsafe reply
cbf99a11fb14db0835acd79ecd7469d37e398660 libceph: introduce ceph_x_authorizer_cleanup()
5e804ac4824302efc3038e086cb21f2e93ab8900 ceph: don't invalidate page cache when inode is no longer used
343128ce91836d4131ead74b53d83b72e93d55b2 libceph: use local variable cursor instead of &msg->cursor
70cf052d0c4b60b6fbb981380660893306b9f172 libceph: remove con argument in handle_reply()
b5b98989dc7ed2093aeb76f2d0db79888582b0a2 ceph: combine as many iovec as possile into one OSD request
777d738a5e58ba3b6f3932ab1543ce93703f4873 ceph: fix message length computation
1291fb950f12005600eb410c206fffd7231dee6f ceph: fix a comment typo
335c25858218e76ef47f92ecb9d22e919d36140d libceph: advertise support for keepalive2
7f61f545657281a3a1b0faf68993165ebdecc51b libceph: don't access invalid memory in keepalive2 path
438386853d4c0c48fe73bf05a7d61c70ca5a3bfb ceph: improve readahead for file holes
55b0b31cbc09f80db384671e22cdc94b2aa26b29 ceph: get inode size for each append write
d15f9d694b77fe5e4ea12b3031ecaa13b5aa2b10 libceph: check data_len in ->alloc_msg()
8b9558aab853e98ba6e3fee0dd8545544966958c libceph: use keepalive2 to verify the mon session is alive
6dd74e44dc1df85f125982a8d6591bc4a76c9f5d libceph: set 'exists' flag for newly up osd
5fdb1389e1399d6801a8c5d10952ef4153039fb2 ceph: cleanup use of ceph_msg_get
e36d571d70c7f46b20c28d81025fd5fc044a8e22 ceph: no need to get parent inode in ceph_open
a43137f7b0f1467cf3005b6ff6574d978642d247 ceph: remove the useless judgement
1550d34e5626a20a2e12c73bdc1e6e217a0ba897 ceph: remove redundant test of head->safe and silence static analysis warnings
23078637e05460428f803be7d0f46908df8a970a ceph: fix queuing inode to mdsdir's snaprealm
6893162215d7bf08a4273247ec1fc7dedee5135c libceph: rename con_work() to ceph_con_workfn()
d920ff6fc7c1ec3d7bd80432bff5575c0ebe426c libceph: Avoid holding the zero page on ceph_msgr_slab_init errors
b79b23682a1649f30960fb5bd920ba46c89a1b14 libceph: remove the unused macro AES_KEY_SIZE
a341d4df87487ae68189e0be869c39a2b0cb9aaa ceph: invalidate dirty pages after forced umount
48fec5d0a504dfbb302cb1dd24ebb0b82a46cce9 ceph: EIO all operations after forced umount
fc927cd32feca2acefd90a4ac317fa4f0a2e5955 ceph: always re-send cap flushes when MDS recovers
f6762cb2ca48e9052b5233c338fa254fa58d8981 ceph: fix ceph_encode_locks_to_buffer()
c44bd69c0c8cfadf0239437635b2933efb1f6c4c libceph: treat sockaddr_storage with uninitialized family as blank
757856d2b9568a701df9ea6a4be68effbb9d6f44 libceph: enable ceph in a non-default network namespace
eeb1bd5c40edb0e2fd925c8535e2fdebdbc5cef2 net: Add a struct net parameter to sock_create_kern
c2cfa19400979dc1a14bba75f83b451b0cd9507a libceph: Fix ceph_tcp_sendpage()'s more boolean usage
6ba8edc0bcbdf337293e60123ddac8fc1c895a3c libceph: Remove spurious kunmap() of the zero page
e1966b49446a43994c3f25a07d0eb4d05660b429 ceph: fix ceph_writepages_start()
fdd4e15838e59c394a1ec4963b57c22c12608685 ceph: rework dcache readdir
b459be739f97e2062b2ba77cfe8ea198dbd58904 crush: sync up with userspace
8f529795bace5d6263b134f4ff3adccfc0a0cce6 crush: fix crash from invalid 'take' argument
687265e5a885d6308f5d73e738efe3c2674fa218 ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
f66fd9f0952187d274c13c136b74548f792c1925 ceph: pre-allocate data structure that tracks caps flushing
e548e9b93d3e565e42b938a99804114565be1f81 ceph: re-send flushing caps (which are revoked) in reconnect stage
a2971c8ccb9bd7677a6c43cdbed9aacfef5e9f26 ceph: send TID of the oldest pending caps flush to MDS
8310b08913eca8aee98744c9aff1ec0d1f603b19 ceph: track pending caps flushing globally
553adfd941f8ca622965ef809553d918ea039929 ceph: track pending caps flushing accurately
6c13a6bb55df6666275b992ba76620324429d7cf libceph: fix wrong name "Ceph filesystem for Linux" 
da819c8150c5b6e6a6a21ee41135b88f6cd18c3e ceph: fix directory fsync
89b52fe14de4d703ba837a7418bb4cd286dcc87f ceph: fix flushing caps
41445999aeec1f0fdf196ab55b2c770473b2ea01 ceph: don't include used caps in cap_wanted
3e0708b990f7e46d87d47b3b06de322490f2f2ee ceph: ratelimit warn messages for MDS closes session
5be73034771c8f18b241f1974803865a4de2cad1 ceph: simplify two mount_timeout sites
216639dd5091de4f4d7ad19b0b8dde11fad18286 libceph: a couple tweaks for wait loops
a319bf56a617354e62cf5f774d2ca4e1a8a3bff3 libceph: store timeouts in jiffies, verify user input
d50c97b566c5bbf990eff472e9feaa58fdebdd33 libceph: nuke time_sub()
e8a7b8b12b13831467c6158c1e82801e25b5dd98 ceph: exclude setfilelock requests when calculating oldest tid
745a8e3bccbc6adae69a98ddc525e529aa44636e ceph: don't pre-allocate space for cap release messages
affbc19a68f9966ad65a773db405f78e2bafc07b ceph: make sure syncfs flushes all cap snaps
622f3e250f498976ad4cbae6f2be5cb359ded4f5 ceph: don't trim auth cap when there are cap snaps
604d1b0245b97738cde4341944ad93edff4b2827 ceph: take snap_rwsem when accessing snap realm's cached_context
860560904962d08fd38666207c910065fe53e074 ceph: avoid sending unnessesary FLUSHSNAP message
5dda377cf0a6bd43f64a3c1efb670d7c668e7b29 ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference
7b06a826e7c52d77ce801e5960ecf0338eafe886 ceph: use empty snap context for uninline_data and get_pool_perm
10183a69551f76702ac68bc74a437b25419c6de0 ceph: check OSD caps before read/write
144cba1493fdd6e3e1980e439a31df877831ebcd libceph: allow setting osd_req_op's flags
66ba609f7b96096116ca7bbc21ec6922ea41a992 libceph: properly release STAT request's raw_data_in

Actions #3

Updated by Markus Blank-Burian about 8 years ago

Took a while, but found a very similar one with 4.4.1 using only a few patches:

Feb 25 20:57:51 kaa-101 kernel: general protection fault: 0000 [#1] SMP 
Feb 25 20:57:51 kaa-101 kernel: Modules linked in: cbc ceph libceph arc4 ecb md4 hmac nls_utf8 cifs 8021q garp mrp stp llc autofs4 binfmt_misc xfs ipmi_watchdog ipmi_si ipmi_devintf ipmi_msghandler mgag200 ttm drm_kms_helper syscopyarea sysfillrect sysimgblt input_leds fb_sys_fops led_class evdev acpi_cpufreq amd64_edac_mod drm edac_mce_amd psmouse pcspkr sp5100_tco edac_core fam15h_power k10temp i2c_piix4 rtc_cmos button processor rpcsec_gss_krb5 fuse nfsv4 nfs btrfs xor raid6_pq af_packet hid_generic usbhid zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) sd_mod ohci_pci bonding crc32c_intel ohci_hcd ehci_pci ehci_hcd ahci libahci usbcore libata usb_common dm_mirror dm_region_hash dm_log dm_mod unix
Feb 25 20:57:51 kaa-101 kernel: CPU: 22 PID: 2203 Comm: slurm_script Tainted: P        W  O    4.4.1+ #21
Feb 25 20:57:51 kaa-101 kernel: Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00       09/04/2012
Feb 25 20:57:51 kaa-101 kernel: task: ffff88136c663a00 ti: ffff882038300000 task.ti: ffff882038300000
Feb 25 20:57:51 kaa-101 kernel: RIP: 0010:[<ffffffff81292d76>]  [<ffffffff81292d76>] lockref_get_not_dead+0x5/0x81
Feb 25 20:57:51 kaa-101 kernel: RSP: 0018:ffff882038303da0  EFLAGS: 00010206
Feb 25 20:57:51 kaa-101 kernel: RAX: 303030303035368d RBX: ffff882038303ef0 RCX: 0000000100000000
Feb 25 20:57:51 kaa-101 kernel: RDX: ffff88057283b000 RSI: ffff880126dd8b89 RDI: 303030303035368d
Feb 25 20:57:51 kaa-101 kernel: RBP: ffff882038303da8 R08: 00000000008f1e8a R09: 0000000000000008
Feb 25 20:57:51 kaa-101 kernel: R10: ffff882038303d40 R11: 0000000000000202 R12: 0000000000000002
Feb 25 20:57:51 kaa-101 kernel: R13: 3030303030353635 R14: ffff88010f4518a0 R15: 0000000000002c80
Feb 25 20:57:51 kaa-101 kernel: FS:  00007f577d00d700(0000) GS:ffff881807d80000(0000) knlGS:0000000000000000
Feb 25 20:57:51 kaa-101 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 25 20:57:51 kaa-101 kernel: CR2: 00000000008f2000 CR3: 0000001031e9c000 CR4: 00000000000406e0
Feb 25 20:57:51 kaa-101 kernel: Stack:
Feb 25 20:57:51 kaa-101 kernel:  ffff882038303ef0 ffff882038303e90 ffffffffa0333a86 ffff882038303df8
Feb 25 20:57:51 kaa-101 kernel:  ffffffff81183ae6 ffff88200119e170 0000000000000591 0000000000000591
Feb 25 20:57:51 kaa-101 kernel:  ffff882a07faf1f8 ffff88010f451a00 ffff882038303e50 0000003f38303ec0
Feb 25 20:57:51 kaa-101 kernel: Call Trace:
Feb 25 20:57:51 kaa-101 kernel:  [<ffffffffa0333a86>] ceph_readdir+0xad0/0xe4e [ceph]
Feb 25 20:57:51 kaa-101 kernel:  [<ffffffff81183ae6>] ? mem_cgroup_try_charge+0x66/0x1a0
Feb 25 20:57:51 kaa-101 kernel:  [<ffffffff810edf95>] ? acct_account_cputime+0x17/0x19
Feb 25 20:57:51 kaa-101 kernel:  [<ffffffff8119c4ec>] iterate_dir+0x7c/0x103
Feb 25 20:57:51 kaa-101 kernel:  [<ffffffff8119c94b>] SyS_getdents+0x89/0x10e
Feb 25 20:57:51 kaa-101 kernel:  [<ffffffff8100185a>] ? syscall_trace_enter_phase1+0xd3/0x13f
Feb 25 20:57:51 kaa-101 kernel:  [<ffffffff8119c573>] ? iterate_dir+0x103/0x103
Feb 25 20:57:51 kaa-101 kernel:  [<ffffffff815816ee>] entry_SYSCALL_64_fastpath+0x12/0x71
Feb 25 20:57:51 kaa-101 kernel: Code: 04 48 89 df c6 07 00 0f 1f 40 00 eb b8 31 c0 eb b9 55 48 89 e5 8b 07 85 c0 74 09 c7 47 04 80 ff ff ff 5d c3 0f 0b 55 48 89 e5 53 <48> 8b 17 85 d2 75 27 48 89 d0 48 c1 f8 20 8d 48 01 48 c1 e1 20 
Feb 25 20:57:51 kaa-101 kernel: RIP  [<ffffffff81292d76>] lockref_get_not_dead+0x5/0x81
Feb 25 20:57:51 kaa-101 kernel:  RSP <ffff882038303da0>
Feb 25 20:57:51 kaa-101 kernel: ---[ end trace c037e0a2aa8fe44c ]---

The only lockref_get_not_dead call is here:

                rcu_read_lock();
                spin_lock(&parent->d_lock);
                /* check i_size again here, because empty directory can be
                 * marked as complete while not holding the i_mutex. */
                if (ceph_dir_is_complete_ordered(dir) &&
                    ptr_pos < i_size_read(dir))
                        dentry = cache_ctl.dentries[cache_ctl.index % nsize];
                else
                        dentry = NULL;
                spin_unlock(&parent->d_lock);
                if (dentry && !lockref_get_not_dead(&dentry->d_lockref))
                        dentry = NULL;
                rcu_read_unlock();

I will again fallback to noasyncreaddir.

Actions #4

Updated by Zheng Yan about 8 years ago

I found an bug in the fill cache code, could you please try the attached patch

Actions #5

Updated by Markus Blank-Burian about 8 years ago

Thanks for the patch! I will let you know, if the bug hits again.

Actions #6

Updated by Zheng Yan almost 8 years ago

  • Status changed from 12 to Resolved
Actions

Also available in: Atom PDF