Actions
Bug #14232
closedKernel NULL pointer dereference in __dcache_readdir
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
I found this bug using the kernel client on linux v4.1.15 with all ceph/libceph patches applied up to v4.4-rc8.
[Mon Jan 4 09:14:08 2016] Key type ceph registered [Mon Jan 4 09:14:08 2016] libceph: loaded (mon/osd proto 15/24) [Mon Jan 4 09:14:08 2016] ceph: loaded (mds proto 32) [Mon Jan 4 09:14:08 2016] libceph: client63344014 fsid 7900aaa3-1a32-4c6f-84fb-2ee08089198f [Mon Jan 4 09:14:08 2016] libceph: mon0 192.168.1.253:6789 session established [Mon Jan 4 13:04:40 2016] libceph: osd83 down [Mon Jan 4 13:05:11 2016] libceph: osd83 up [Mon Jan 4 13:08:05 2016] libceph: osd83 down [Mon Jan 4 13:08:54 2016] libceph: osd83 up [Mon Jan 4 13:10:47 2016] libceph: get_reply osd83 tid 10952 unknown, skipping [Mon Jan 4 13:11:28 2016] libceph: get_reply osd83 tid 10978 unknown, skipping [Mon Jan 4 14:34:44 2016] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c [Mon Jan 4 14:34:44 2016] IP: [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph] [Mon Jan 4 14:34:44 2016] PGD 80a619067 PUD 80a53f067 PMD 0 [Mon Jan 4 14:34:44 2016] Oops: 0000 [#1] SMP [Mon Jan 4 14:34:44 2016] Modules linked in: cbc ceph libceph ipmi_watchdog w83627ehf adm1026 w83795 w83793 hwmon_vid jc42 8021q garp mrp stp llc autofs4 cpufreq_ondemand xfs ipmi_si ipmi_devintf ipmi_msghandler mgag200 syscopyarea sysfillrect sysimgblt ttm drm_kms_helper kvm_amd drm kvm amd64_edac_mod microcode psmouse evdev pcspkr sp5100_tco edac_mce_amd rtc_cmos i2c_piix4 k10temp edac_core button acpi_cpufreq processor rpcsec_gss_krb5 fuse nfsv4 nfs af_packet sr_mod cdrom hid_generic usbhid hid bonding usb_storage sd_mod ohci_pci ohci_hcd ehci_pci ehci_hcd ata_generic ahci usbcore pata_atiixp libahci libata usb_common ipv6 dm_mirror dm_region_hash dm_log dm_mod unix [Mon Jan 4 14:34:44 2016] CPU: 5 PID: 28145 Comm: python3 Tainted: P W O 4.1.15+ #16 [Mon Jan 4 14:34:44 2016] Hardware name: Supermicro H8DGU/H8DGU, BIOS 1.0b 09/02/10 [Mon Jan 4 14:34:44 2016] task: ffff88080d821890 ti: ffff8806fabe8000 task.ti: ffff8806fabe8000 [Mon Jan 4 14:34:44 2016] RIP: 0010:[<ffffffffa0205761>] [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph] [Mon Jan 4 14:34:44 2016] RSP: 0018:ffff8806fabebde8 EFLAGS: 00010246 [Mon Jan 4 14:34:44 2016] RAX: 000000000000002f RBX: ffff880721b6d120 RCX: 0000000000000001 [Mon Jan 4 14:34:44 2016] RDX: 0000000000000000 RSI: 0000000100020002 RDI: ffff8807243dfe58 [Mon Jan 4 14:34:44 2016] RBP: ffff8806fabebe98 R08: 0000000004300430 R09: 0000000000000004 [Mon Jan 4 14:34:44 2016] R10: ffff8806fabebd60 R11: 0000000000000000 R12: ffff880713e32348 [Mon Jan 4 14:34:44 2016] R13: ffff88034bac7ec0 R14: ffff8807243dfe00 R15: ffff880713e32348 [Mon Jan 4 14:34:44 2016] FS: 00007f59bc5b67c0(0000) GS:ffff88080fc40000(0000) knlGS:0000000000000000 [Mon Jan 4 14:34:44 2016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Mon Jan 4 14:34:44 2016] CR2: 000000000000000c CR3: 000000080c0f0000 CR4: 00000000000006e0 [Mon Jan 4 14:34:44 2016] Stack: [Mon Jan 4 14:34:44 2016] ffff8807218fa500 ffff8807243dfe58 ffff880713e32020 0000000000000000 [Mon Jan 4 14:34:44 2016] 0000002ffabebe48 ffff880721b6d120 ffff880c09c7e800 ffff8800a5104a80 [Mon Jan 4 14:34:44 2016] 0000000000000040 ffff88080d821890 ffff88080a785700 ffff8806fabebef8 [Mon Jan 4 14:34:44 2016] Call Trace: [Mon Jan 4 14:34:44 2016] [<ffffffff81112697>] iterate_dir+0x74/0xfb [Mon Jan 4 14:34:45 2016] [<ffffffff81112809>] SyS_getdents+0x78/0xc4 [Mon Jan 4 14:34:45 2016] [<ffffffff81112462>] ? fillonedir+0xb6/0xb6 [Mon Jan 4 14:34:45 2016] [<ffffffff81409917>] system_call_fastpath+0x12/0x6a [Mon Jan 4 14:34:45 2016] Code: 46 58 4d 8b 5e 78 48 89 c7 48 89 85 58 ff ff ff 4c 89 9d 68 ff ff ff e8 7f 3e 20 e1 4c 8b 9d 68 ff ff ff 31 d2 8b 85 74 ff ff ff <41> 3b 43 0c 75 72 49 8b 46 30 48 85 c0 74 69 48 83 b8 d0 fc ff [Mon Jan 4 14:34:45 2016] RIP [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph] [Mon Jan 4 14:34:45 2016] RSP <ffff8806fabebde8> [Mon Jan 4 14:34:45 2016] CR2: 000000000000000c [Mon Jan 4 14:34:45 2016] ---[ end trace 8869dff4d5641722 ]--- [Mon Jan 4 16:43:37 2016] libceph: osd85 down
Using gdb, the location checked out to be "0x0000000000005785 <+1086>"
0x0000000000005758 <+1041>: lea 0x58(%r14),%rax 0x000000000000575c <+1045>: mov 0x78(%r14),%r11 0x0000000000005760 <+1049>: mov %rax,%rdi 0x0000000000005763 <+1052>: mov %rax,-0xa8(%rbp) 0x000000000000576a <+1059>: mov %r11,-0x98(%rbp) 0x0000000000005771 <+1066>: callq 0x5776 <ceph_readdir+1071> 0x0000000000005776 <+1071>: mov -0x98(%rbp),%r11 0x000000000000577d <+1078>: xor %edx,%edx 0x000000000000577f <+1080>: mov -0x8c(%rbp),%eax 0x0000000000005785 <+1086>: cmp 0xc(%r11),%eax 0x0000000000005789 <+1090>: jne 0x57fd <ceph_readdir+1206> 0x000000000000578b <+1092>: mov 0x30(%r14),%rax 0x000000000000578f <+1096>: test %rax,%rax 0x0000000000005792 <+1099>: je 0x57fd <ceph_readdir+1206> 0x0000000000005794 <+1101>: cmpq $0xffffffffffffffff,-0x330(%rax) 0x000000000000579c <+1109>: je 0x57fd <ceph_readdir+1206> 0x000000000000579e <+1111>: cmpq $0x2,-0x338(%rax) 0x00000000000057a6 <+1119>: je 0x57fd <ceph_readdir+1206>
which corresponds to
0x5785 is in ceph_readdir (fs/ceph/dir.c:207). 202 break; 203 204 emit_dentry = false; 205 di = ceph_dentry(dentry); 206 spin_lock(&dentry->d_lock); 207 if (di->lease_shared_gen == shared_gen && 208 d_really_is_positive(dentry) && 209 ceph_snap(d_inode(dentry)) != CEPH_SNAPDIR && 210 ceph_ino(d_inode(dentry)) != CEPH_INO_CEPH && 211 fpos_cmp(ctx->pos, di->offset) <= 0) {
So this means di==NULL after the spin_lock. Fyi, my patchset includes "ceph: rework dcache readdir", which might be related to this issue. As a workaround, I will now mount with noasyncreaddir.
Files
Actions