Project

General

Profile

Actions

Bug #14232

closed

Kernel NULL pointer dereference in __dcache_readdir

Added by Markus Blank-Burian over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I found this bug using the kernel client on linux v4.1.15 with all ceph/libceph patches applied up to v4.4-rc8.

[Mon Jan  4 09:14:08 2016] Key type ceph registered
[Mon Jan  4 09:14:08 2016] libceph: loaded (mon/osd proto 15/24)
[Mon Jan  4 09:14:08 2016] ceph: loaded (mds proto 32)
[Mon Jan  4 09:14:08 2016] libceph: client63344014 fsid 7900aaa3-1a32-4c6f-84fb-2ee08089198f
[Mon Jan  4 09:14:08 2016] libceph: mon0 192.168.1.253:6789 session established
[Mon Jan  4 13:04:40 2016] libceph: osd83 down
[Mon Jan  4 13:05:11 2016] libceph: osd83 up
[Mon Jan  4 13:08:05 2016] libceph: osd83 down
[Mon Jan  4 13:08:54 2016] libceph: osd83 up
[Mon Jan  4 13:10:47 2016] libceph: get_reply osd83 tid 10952 unknown, skipping
[Mon Jan  4 13:11:28 2016] libceph: get_reply osd83 tid 10978 unknown, skipping
[Mon Jan  4 14:34:44 2016] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
[Mon Jan  4 14:34:44 2016] IP: [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph]
[Mon Jan  4 14:34:44 2016] PGD 80a619067 PUD 80a53f067 PMD 0 
[Mon Jan  4 14:34:44 2016] Oops: 0000 [#1] SMP 
[Mon Jan  4 14:34:44 2016] Modules linked in: cbc ceph libceph ipmi_watchdog w83627ehf adm1026 w83795 w83793 hwmon_vid jc42 8021q garp mrp stp llc autofs4 cpufreq_ondemand xfs ipmi_si ipmi_devintf ipmi_msghandler mgag200 syscopyarea sysfillrect sysimgblt ttm drm_kms_helper kvm_amd drm kvm amd64_edac_mod microcode psmouse evdev pcspkr sp5100_tco edac_mce_amd rtc_cmos i2c_piix4 k10temp edac_core button acpi_cpufreq processor rpcsec_gss_krb5 fuse nfsv4 nfs af_packet sr_mod cdrom hid_generic usbhid hid bonding usb_storage sd_mod ohci_pci ohci_hcd ehci_pci ehci_hcd ata_generic ahci usbcore pata_atiixp libahci libata usb_common ipv6 dm_mirror dm_region_hash dm_log dm_mod unix
[Mon Jan  4 14:34:44 2016] CPU: 5 PID: 28145 Comm: python3 Tainted: P        W  O    4.1.15+ #16
[Mon Jan  4 14:34:44 2016] Hardware name: Supermicro H8DGU/H8DGU, BIOS 1.0b       09/02/10  
[Mon Jan  4 14:34:44 2016] task: ffff88080d821890 ti: ffff8806fabe8000 task.ti: ffff8806fabe8000
[Mon Jan  4 14:34:44 2016] RIP: 0010:[<ffffffffa0205761>]  [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph]
[Mon Jan  4 14:34:44 2016] RSP: 0018:ffff8806fabebde8  EFLAGS: 00010246
[Mon Jan  4 14:34:44 2016] RAX: 000000000000002f RBX: ffff880721b6d120 RCX: 0000000000000001
[Mon Jan  4 14:34:44 2016] RDX: 0000000000000000 RSI: 0000000100020002 RDI: ffff8807243dfe58
[Mon Jan  4 14:34:44 2016] RBP: ffff8806fabebe98 R08: 0000000004300430 R09: 0000000000000004
[Mon Jan  4 14:34:44 2016] R10: ffff8806fabebd60 R11: 0000000000000000 R12: ffff880713e32348
[Mon Jan  4 14:34:44 2016] R13: ffff88034bac7ec0 R14: ffff8807243dfe00 R15: ffff880713e32348
[Mon Jan  4 14:34:44 2016] FS:  00007f59bc5b67c0(0000) GS:ffff88080fc40000(0000) knlGS:0000000000000000
[Mon Jan  4 14:34:44 2016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Jan  4 14:34:44 2016] CR2: 000000000000000c CR3: 000000080c0f0000 CR4: 00000000000006e0
[Mon Jan  4 14:34:44 2016] Stack:
[Mon Jan  4 14:34:44 2016]  ffff8807218fa500 ffff8807243dfe58 ffff880713e32020 0000000000000000
[Mon Jan  4 14:34:44 2016]  0000002ffabebe48 ffff880721b6d120 ffff880c09c7e800 ffff8800a5104a80
[Mon Jan  4 14:34:44 2016]  0000000000000040 ffff88080d821890 ffff88080a785700 ffff8806fabebef8
[Mon Jan  4 14:34:44 2016] Call Trace:
[Mon Jan  4 14:34:44 2016]  [<ffffffff81112697>] iterate_dir+0x74/0xfb
[Mon Jan  4 14:34:45 2016]  [<ffffffff81112809>] SyS_getdents+0x78/0xc4
[Mon Jan  4 14:34:45 2016]  [<ffffffff81112462>] ? fillonedir+0xb6/0xb6
[Mon Jan  4 14:34:45 2016]  [<ffffffff81409917>] system_call_fastpath+0x12/0x6a
[Mon Jan  4 14:34:45 2016] Code: 46 58 4d 8b 5e 78 48 89 c7 48 89 85 58 ff ff ff 4c 89 9d 68 ff ff ff e8 7f 3e 20 e1 4c 8b 9d 68 ff ff ff 31 d2 8b 85 74 ff ff ff <41> 3b 43 0c 75 72 49 8b 46 30 48 85 c0 74 69 48 83 b8 d0 fc ff 
[Mon Jan  4 14:34:45 2016] RIP  [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph]
[Mon Jan  4 14:34:45 2016]  RSP <ffff8806fabebde8>
[Mon Jan  4 14:34:45 2016] CR2: 000000000000000c
[Mon Jan  4 14:34:45 2016] ---[ end trace 8869dff4d5641722 ]---
[Mon Jan  4 16:43:37 2016] libceph: osd85 down

Using gdb, the location checked out to be "0x0000000000005785 <+1086>"

   0x0000000000005758 <+1041>:  lea    0x58(%r14),%rax
   0x000000000000575c <+1045>:  mov    0x78(%r14),%r11
   0x0000000000005760 <+1049>:  mov    %rax,%rdi
   0x0000000000005763 <+1052>:  mov    %rax,-0xa8(%rbp)
   0x000000000000576a <+1059>:  mov    %r11,-0x98(%rbp)
   0x0000000000005771 <+1066>:  callq  0x5776 <ceph_readdir+1071>
   0x0000000000005776 <+1071>:  mov    -0x98(%rbp),%r11
   0x000000000000577d <+1078>:  xor    %edx,%edx
   0x000000000000577f <+1080>:  mov    -0x8c(%rbp),%eax
   0x0000000000005785 <+1086>:  cmp    0xc(%r11),%eax
   0x0000000000005789 <+1090>:  jne    0x57fd <ceph_readdir+1206>
   0x000000000000578b <+1092>:  mov    0x30(%r14),%rax
   0x000000000000578f <+1096>:  test   %rax,%rax
   0x0000000000005792 <+1099>:  je     0x57fd <ceph_readdir+1206>
   0x0000000000005794 <+1101>:  cmpq   $0xffffffffffffffff,-0x330(%rax)
   0x000000000000579c <+1109>:  je     0x57fd <ceph_readdir+1206>
   0x000000000000579e <+1111>:  cmpq   $0x2,-0x338(%rax)
   0x00000000000057a6 <+1119>:  je     0x57fd <ceph_readdir+1206>

which corresponds to

0x5785 is in ceph_readdir (fs/ceph/dir.c:207).
202                             break;
203
204                     emit_dentry = false;
205                     di = ceph_dentry(dentry);
206                     spin_lock(&dentry->d_lock);
207                     if (di->lease_shared_gen == shared_gen &&
208                         d_really_is_positive(dentry) &&
209                         ceph_snap(d_inode(dentry)) != CEPH_SNAPDIR &&
210                         ceph_ino(d_inode(dentry)) != CEPH_INO_CEPH &&
211                         fpos_cmp(ctx->pos, di->offset) <= 0) {

So this means di==NULL after the spin_lock. Fyi, my patchset includes "ceph: rework dcache readdir", which might be related to this issue. As a workaround, I will now mount with noasyncreaddir.


Files

readdir-cache.patch (2.3 KB) readdir-cache.patch Zheng Yan, 02/26/2016 08:08 AM
Actions

Also available in: Atom PDF