Bug #14232
closedKernel NULL pointer dereference in __dcache_readdir
0%
Description
I found this bug using the kernel client on linux v4.1.15 with all ceph/libceph patches applied up to v4.4-rc8.
[Mon Jan 4 09:14:08 2016] Key type ceph registered [Mon Jan 4 09:14:08 2016] libceph: loaded (mon/osd proto 15/24) [Mon Jan 4 09:14:08 2016] ceph: loaded (mds proto 32) [Mon Jan 4 09:14:08 2016] libceph: client63344014 fsid 7900aaa3-1a32-4c6f-84fb-2ee08089198f [Mon Jan 4 09:14:08 2016] libceph: mon0 192.168.1.253:6789 session established [Mon Jan 4 13:04:40 2016] libceph: osd83 down [Mon Jan 4 13:05:11 2016] libceph: osd83 up [Mon Jan 4 13:08:05 2016] libceph: osd83 down [Mon Jan 4 13:08:54 2016] libceph: osd83 up [Mon Jan 4 13:10:47 2016] libceph: get_reply osd83 tid 10952 unknown, skipping [Mon Jan 4 13:11:28 2016] libceph: get_reply osd83 tid 10978 unknown, skipping [Mon Jan 4 14:34:44 2016] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c [Mon Jan 4 14:34:44 2016] IP: [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph] [Mon Jan 4 14:34:44 2016] PGD 80a619067 PUD 80a53f067 PMD 0 [Mon Jan 4 14:34:44 2016] Oops: 0000 [#1] SMP [Mon Jan 4 14:34:44 2016] Modules linked in: cbc ceph libceph ipmi_watchdog w83627ehf adm1026 w83795 w83793 hwmon_vid jc42 8021q garp mrp stp llc autofs4 cpufreq_ondemand xfs ipmi_si ipmi_devintf ipmi_msghandler mgag200 syscopyarea sysfillrect sysimgblt ttm drm_kms_helper kvm_amd drm kvm amd64_edac_mod microcode psmouse evdev pcspkr sp5100_tco edac_mce_amd rtc_cmos i2c_piix4 k10temp edac_core button acpi_cpufreq processor rpcsec_gss_krb5 fuse nfsv4 nfs af_packet sr_mod cdrom hid_generic usbhid hid bonding usb_storage sd_mod ohci_pci ohci_hcd ehci_pci ehci_hcd ata_generic ahci usbcore pata_atiixp libahci libata usb_common ipv6 dm_mirror dm_region_hash dm_log dm_mod unix [Mon Jan 4 14:34:44 2016] CPU: 5 PID: 28145 Comm: python3 Tainted: P W O 4.1.15+ #16 [Mon Jan 4 14:34:44 2016] Hardware name: Supermicro H8DGU/H8DGU, BIOS 1.0b 09/02/10 [Mon Jan 4 14:34:44 2016] task: ffff88080d821890 ti: ffff8806fabe8000 task.ti: ffff8806fabe8000 [Mon Jan 4 14:34:44 2016] RIP: 0010:[<ffffffffa0205761>] [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph] [Mon Jan 4 14:34:44 2016] RSP: 0018:ffff8806fabebde8 EFLAGS: 00010246 [Mon Jan 4 14:34:44 2016] RAX: 000000000000002f RBX: ffff880721b6d120 RCX: 0000000000000001 [Mon Jan 4 14:34:44 2016] RDX: 0000000000000000 RSI: 0000000100020002 RDI: ffff8807243dfe58 [Mon Jan 4 14:34:44 2016] RBP: ffff8806fabebe98 R08: 0000000004300430 R09: 0000000000000004 [Mon Jan 4 14:34:44 2016] R10: ffff8806fabebd60 R11: 0000000000000000 R12: ffff880713e32348 [Mon Jan 4 14:34:44 2016] R13: ffff88034bac7ec0 R14: ffff8807243dfe00 R15: ffff880713e32348 [Mon Jan 4 14:34:44 2016] FS: 00007f59bc5b67c0(0000) GS:ffff88080fc40000(0000) knlGS:0000000000000000 [Mon Jan 4 14:34:44 2016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Mon Jan 4 14:34:44 2016] CR2: 000000000000000c CR3: 000000080c0f0000 CR4: 00000000000006e0 [Mon Jan 4 14:34:44 2016] Stack: [Mon Jan 4 14:34:44 2016] ffff8807218fa500 ffff8807243dfe58 ffff880713e32020 0000000000000000 [Mon Jan 4 14:34:44 2016] 0000002ffabebe48 ffff880721b6d120 ffff880c09c7e800 ffff8800a5104a80 [Mon Jan 4 14:34:44 2016] 0000000000000040 ffff88080d821890 ffff88080a785700 ffff8806fabebef8 [Mon Jan 4 14:34:44 2016] Call Trace: [Mon Jan 4 14:34:44 2016] [<ffffffff81112697>] iterate_dir+0x74/0xfb [Mon Jan 4 14:34:45 2016] [<ffffffff81112809>] SyS_getdents+0x78/0xc4 [Mon Jan 4 14:34:45 2016] [<ffffffff81112462>] ? fillonedir+0xb6/0xb6 [Mon Jan 4 14:34:45 2016] [<ffffffff81409917>] system_call_fastpath+0x12/0x6a [Mon Jan 4 14:34:45 2016] Code: 46 58 4d 8b 5e 78 48 89 c7 48 89 85 58 ff ff ff 4c 89 9d 68 ff ff ff e8 7f 3e 20 e1 4c 8b 9d 68 ff ff ff 31 d2 8b 85 74 ff ff ff <41> 3b 43 0c 75 72 49 8b 46 30 48 85 c0 74 69 48 83 b8 d0 fc ff [Mon Jan 4 14:34:45 2016] RIP [<ffffffffa0205761>] ceph_readdir+0x43e/0xbaa [ceph] [Mon Jan 4 14:34:45 2016] RSP <ffff8806fabebde8> [Mon Jan 4 14:34:45 2016] CR2: 000000000000000c [Mon Jan 4 14:34:45 2016] ---[ end trace 8869dff4d5641722 ]--- [Mon Jan 4 16:43:37 2016] libceph: osd85 down
Using gdb, the location checked out to be "0x0000000000005785 <+1086>"
0x0000000000005758 <+1041>: lea 0x58(%r14),%rax 0x000000000000575c <+1045>: mov 0x78(%r14),%r11 0x0000000000005760 <+1049>: mov %rax,%rdi 0x0000000000005763 <+1052>: mov %rax,-0xa8(%rbp) 0x000000000000576a <+1059>: mov %r11,-0x98(%rbp) 0x0000000000005771 <+1066>: callq 0x5776 <ceph_readdir+1071> 0x0000000000005776 <+1071>: mov -0x98(%rbp),%r11 0x000000000000577d <+1078>: xor %edx,%edx 0x000000000000577f <+1080>: mov -0x8c(%rbp),%eax 0x0000000000005785 <+1086>: cmp 0xc(%r11),%eax 0x0000000000005789 <+1090>: jne 0x57fd <ceph_readdir+1206> 0x000000000000578b <+1092>: mov 0x30(%r14),%rax 0x000000000000578f <+1096>: test %rax,%rax 0x0000000000005792 <+1099>: je 0x57fd <ceph_readdir+1206> 0x0000000000005794 <+1101>: cmpq $0xffffffffffffffff,-0x330(%rax) 0x000000000000579c <+1109>: je 0x57fd <ceph_readdir+1206> 0x000000000000579e <+1111>: cmpq $0x2,-0x338(%rax) 0x00000000000057a6 <+1119>: je 0x57fd <ceph_readdir+1206>
which corresponds to
0x5785 is in ceph_readdir (fs/ceph/dir.c:207). 202 break; 203 204 emit_dentry = false; 205 di = ceph_dentry(dentry); 206 spin_lock(&dentry->d_lock); 207 if (di->lease_shared_gen == shared_gen && 208 d_really_is_positive(dentry) && 209 ceph_snap(d_inode(dentry)) != CEPH_SNAPDIR && 210 ceph_ino(d_inode(dentry)) != CEPH_INO_CEPH && 211 fpos_cmp(ctx->pos, di->offset) <= 0) {
So this means di==NULL after the spin_lock. Fyi, my patchset includes "ceph: rework dcache readdir", which might be related to this issue. As a workaround, I will now mount with noasyncreaddir.
Files
Updated by Zheng Yan over 8 years ago
Sorry for the delay, please upload your patches (which are applied to 4.1.15 kernel). If you can, please take 4.4 kernel a try.
Updated by Markus Blank-Burian over 8 years ago
The patch list has grown rather long. If there have been any changes/fixes in dcache, I don't have them included. Within the next 1-2 weeks, I can try out 4.4, since its now released. There was a bug in the NUMA migration code which stopped us from using 4.3.
fc0561cefc04e7803c0f6501ca4f310a502f65b8 xfs: optimise away log forces on timestamp updates for fdatasync 583d0fef756a7615e50f0f68ea0892a497d03971 libceph: clear msg->con in ceph_msg_release() only a51983e4dd2d4d63912aab939f657c4cd476e21a libceph: add nocephx_sign_messages option 859bff51dc5e92ddfb5eb6f17b8040d9311095bb libceph: stop duplicating client fields in messenger 4199b8eec36405822619d4176bddfacf7b47eb44 libceph: drop authorizer check from cephx msg signing routines 79dbd1baa651cece408e68a1b445f3628c4b5bdc libceph: msg signing callouts don't need con argument 8a703a383dd3458753e0ad71860ed3a5097692b3 libceph: evaluate osd_req_op_data() arguments only once 68cd5b4b7612c2956d8553dfb39490b29f32566d ceph: make fsync() wait unsafe requests that created/modified inode 4c06ace81a60636dec358c288ef6aaf3aa6dc599 ceph: add request to i_unsafe_dirops when getting unsafe reply cbf99a11fb14db0835acd79ecd7469d37e398660 libceph: introduce ceph_x_authorizer_cleanup() 5e804ac4824302efc3038e086cb21f2e93ab8900 ceph: don't invalidate page cache when inode is no longer used 343128ce91836d4131ead74b53d83b72e93d55b2 libceph: use local variable cursor instead of &msg->cursor 70cf052d0c4b60b6fbb981380660893306b9f172 libceph: remove con argument in handle_reply() b5b98989dc7ed2093aeb76f2d0db79888582b0a2 ceph: combine as many iovec as possile into one OSD request 777d738a5e58ba3b6f3932ab1543ce93703f4873 ceph: fix message length computation 1291fb950f12005600eb410c206fffd7231dee6f ceph: fix a comment typo 335c25858218e76ef47f92ecb9d22e919d36140d libceph: advertise support for keepalive2 7f61f545657281a3a1b0faf68993165ebdecc51b libceph: don't access invalid memory in keepalive2 path 438386853d4c0c48fe73bf05a7d61c70ca5a3bfb ceph: improve readahead for file holes 55b0b31cbc09f80db384671e22cdc94b2aa26b29 ceph: get inode size for each append write d15f9d694b77fe5e4ea12b3031ecaa13b5aa2b10 libceph: check data_len in ->alloc_msg() 8b9558aab853e98ba6e3fee0dd8545544966958c libceph: use keepalive2 to verify the mon session is alive 6dd74e44dc1df85f125982a8d6591bc4a76c9f5d libceph: set 'exists' flag for newly up osd 5fdb1389e1399d6801a8c5d10952ef4153039fb2 ceph: cleanup use of ceph_msg_get e36d571d70c7f46b20c28d81025fd5fc044a8e22 ceph: no need to get parent inode in ceph_open a43137f7b0f1467cf3005b6ff6574d978642d247 ceph: remove the useless judgement 1550d34e5626a20a2e12c73bdc1e6e217a0ba897 ceph: remove redundant test of head->safe and silence static analysis warnings 23078637e05460428f803be7d0f46908df8a970a ceph: fix queuing inode to mdsdir's snaprealm 6893162215d7bf08a4273247ec1fc7dedee5135c libceph: rename con_work() to ceph_con_workfn() d920ff6fc7c1ec3d7bd80432bff5575c0ebe426c libceph: Avoid holding the zero page on ceph_msgr_slab_init errors b79b23682a1649f30960fb5bd920ba46c89a1b14 libceph: remove the unused macro AES_KEY_SIZE a341d4df87487ae68189e0be869c39a2b0cb9aaa ceph: invalidate dirty pages after forced umount 48fec5d0a504dfbb302cb1dd24ebb0b82a46cce9 ceph: EIO all operations after forced umount fc927cd32feca2acefd90a4ac317fa4f0a2e5955 ceph: always re-send cap flushes when MDS recovers f6762cb2ca48e9052b5233c338fa254fa58d8981 ceph: fix ceph_encode_locks_to_buffer() c44bd69c0c8cfadf0239437635b2933efb1f6c4c libceph: treat sockaddr_storage with uninitialized family as blank 757856d2b9568a701df9ea6a4be68effbb9d6f44 libceph: enable ceph in a non-default network namespace eeb1bd5c40edb0e2fd925c8535e2fdebdbc5cef2 net: Add a struct net parameter to sock_create_kern c2cfa19400979dc1a14bba75f83b451b0cd9507a libceph: Fix ceph_tcp_sendpage()'s more boolean usage 6ba8edc0bcbdf337293e60123ddac8fc1c895a3c libceph: Remove spurious kunmap() of the zero page e1966b49446a43994c3f25a07d0eb4d05660b429 ceph: fix ceph_writepages_start() fdd4e15838e59c394a1ec4963b57c22c12608685 ceph: rework dcache readdir b459be739f97e2062b2ba77cfe8ea198dbd58904 crush: sync up with userspace 8f529795bace5d6263b134f4ff3adccfc0a0cce6 crush: fix crash from invalid 'take' argument 687265e5a885d6308f5d73e738efe3c2674fa218 ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL f66fd9f0952187d274c13c136b74548f792c1925 ceph: pre-allocate data structure that tracks caps flushing e548e9b93d3e565e42b938a99804114565be1f81 ceph: re-send flushing caps (which are revoked) in reconnect stage a2971c8ccb9bd7677a6c43cdbed9aacfef5e9f26 ceph: send TID of the oldest pending caps flush to MDS 8310b08913eca8aee98744c9aff1ec0d1f603b19 ceph: track pending caps flushing globally 553adfd941f8ca622965ef809553d918ea039929 ceph: track pending caps flushing accurately 6c13a6bb55df6666275b992ba76620324429d7cf libceph: fix wrong name "Ceph filesystem for Linux" da819c8150c5b6e6a6a21ee41135b88f6cd18c3e ceph: fix directory fsync 89b52fe14de4d703ba837a7418bb4cd286dcc87f ceph: fix flushing caps 41445999aeec1f0fdf196ab55b2c770473b2ea01 ceph: don't include used caps in cap_wanted 3e0708b990f7e46d87d47b3b06de322490f2f2ee ceph: ratelimit warn messages for MDS closes session 5be73034771c8f18b241f1974803865a4de2cad1 ceph: simplify two mount_timeout sites 216639dd5091de4f4d7ad19b0b8dde11fad18286 libceph: a couple tweaks for wait loops a319bf56a617354e62cf5f774d2ca4e1a8a3bff3 libceph: store timeouts in jiffies, verify user input d50c97b566c5bbf990eff472e9feaa58fdebdd33 libceph: nuke time_sub() e8a7b8b12b13831467c6158c1e82801e25b5dd98 ceph: exclude setfilelock requests when calculating oldest tid 745a8e3bccbc6adae69a98ddc525e529aa44636e ceph: don't pre-allocate space for cap release messages affbc19a68f9966ad65a773db405f78e2bafc07b ceph: make sure syncfs flushes all cap snaps 622f3e250f498976ad4cbae6f2be5cb359ded4f5 ceph: don't trim auth cap when there are cap snaps 604d1b0245b97738cde4341944ad93edff4b2827 ceph: take snap_rwsem when accessing snap realm's cached_context 860560904962d08fd38666207c910065fe53e074 ceph: avoid sending unnessesary FLUSHSNAP message 5dda377cf0a6bd43f64a3c1efb670d7c668e7b29 ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference 7b06a826e7c52d77ce801e5960ecf0338eafe886 ceph: use empty snap context for uninline_data and get_pool_perm 10183a69551f76702ac68bc74a437b25419c6de0 ceph: check OSD caps before read/write 144cba1493fdd6e3e1980e439a31df877831ebcd libceph: allow setting osd_req_op's flags 66ba609f7b96096116ca7bbc21ec6922ea41a992 libceph: properly release STAT request's raw_data_in
Updated by Markus Blank-Burian about 8 years ago
Took a while, but found a very similar one with 4.4.1 using only a few patches:
Feb 25 20:57:51 kaa-101 kernel: general protection fault: 0000 [#1] SMP Feb 25 20:57:51 kaa-101 kernel: Modules linked in: cbc ceph libceph arc4 ecb md4 hmac nls_utf8 cifs 8021q garp mrp stp llc autofs4 binfmt_misc xfs ipmi_watchdog ipmi_si ipmi_devintf ipmi_msghandler mgag200 ttm drm_kms_helper syscopyarea sysfillrect sysimgblt input_leds fb_sys_fops led_class evdev acpi_cpufreq amd64_edac_mod drm edac_mce_amd psmouse pcspkr sp5100_tco edac_core fam15h_power k10temp i2c_piix4 rtc_cmos button processor rpcsec_gss_krb5 fuse nfsv4 nfs btrfs xor raid6_pq af_packet hid_generic usbhid zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) sd_mod ohci_pci bonding crc32c_intel ohci_hcd ehci_pci ehci_hcd ahci libahci usbcore libata usb_common dm_mirror dm_region_hash dm_log dm_mod unix Feb 25 20:57:51 kaa-101 kernel: CPU: 22 PID: 2203 Comm: slurm_script Tainted: P W O 4.4.1+ #21 Feb 25 20:57:51 kaa-101 kernel: Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012 Feb 25 20:57:51 kaa-101 kernel: task: ffff88136c663a00 ti: ffff882038300000 task.ti: ffff882038300000 Feb 25 20:57:51 kaa-101 kernel: RIP: 0010:[<ffffffff81292d76>] [<ffffffff81292d76>] lockref_get_not_dead+0x5/0x81 Feb 25 20:57:51 kaa-101 kernel: RSP: 0018:ffff882038303da0 EFLAGS: 00010206 Feb 25 20:57:51 kaa-101 kernel: RAX: 303030303035368d RBX: ffff882038303ef0 RCX: 0000000100000000 Feb 25 20:57:51 kaa-101 kernel: RDX: ffff88057283b000 RSI: ffff880126dd8b89 RDI: 303030303035368d Feb 25 20:57:51 kaa-101 kernel: RBP: ffff882038303da8 R08: 00000000008f1e8a R09: 0000000000000008 Feb 25 20:57:51 kaa-101 kernel: R10: ffff882038303d40 R11: 0000000000000202 R12: 0000000000000002 Feb 25 20:57:51 kaa-101 kernel: R13: 3030303030353635 R14: ffff88010f4518a0 R15: 0000000000002c80 Feb 25 20:57:51 kaa-101 kernel: FS: 00007f577d00d700(0000) GS:ffff881807d80000(0000) knlGS:0000000000000000 Feb 25 20:57:51 kaa-101 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 25 20:57:51 kaa-101 kernel: CR2: 00000000008f2000 CR3: 0000001031e9c000 CR4: 00000000000406e0 Feb 25 20:57:51 kaa-101 kernel: Stack: Feb 25 20:57:51 kaa-101 kernel: ffff882038303ef0 ffff882038303e90 ffffffffa0333a86 ffff882038303df8 Feb 25 20:57:51 kaa-101 kernel: ffffffff81183ae6 ffff88200119e170 0000000000000591 0000000000000591 Feb 25 20:57:51 kaa-101 kernel: ffff882a07faf1f8 ffff88010f451a00 ffff882038303e50 0000003f38303ec0 Feb 25 20:57:51 kaa-101 kernel: Call Trace: Feb 25 20:57:51 kaa-101 kernel: [<ffffffffa0333a86>] ceph_readdir+0xad0/0xe4e [ceph] Feb 25 20:57:51 kaa-101 kernel: [<ffffffff81183ae6>] ? mem_cgroup_try_charge+0x66/0x1a0 Feb 25 20:57:51 kaa-101 kernel: [<ffffffff810edf95>] ? acct_account_cputime+0x17/0x19 Feb 25 20:57:51 kaa-101 kernel: [<ffffffff8119c4ec>] iterate_dir+0x7c/0x103 Feb 25 20:57:51 kaa-101 kernel: [<ffffffff8119c94b>] SyS_getdents+0x89/0x10e Feb 25 20:57:51 kaa-101 kernel: [<ffffffff8100185a>] ? syscall_trace_enter_phase1+0xd3/0x13f Feb 25 20:57:51 kaa-101 kernel: [<ffffffff8119c573>] ? iterate_dir+0x103/0x103 Feb 25 20:57:51 kaa-101 kernel: [<ffffffff815816ee>] entry_SYSCALL_64_fastpath+0x12/0x71 Feb 25 20:57:51 kaa-101 kernel: Code: 04 48 89 df c6 07 00 0f 1f 40 00 eb b8 31 c0 eb b9 55 48 89 e5 8b 07 85 c0 74 09 c7 47 04 80 ff ff ff 5d c3 0f 0b 55 48 89 e5 53 <48> 8b 17 85 d2 75 27 48 89 d0 48 c1 f8 20 8d 48 01 48 c1 e1 20 Feb 25 20:57:51 kaa-101 kernel: RIP [<ffffffff81292d76>] lockref_get_not_dead+0x5/0x81 Feb 25 20:57:51 kaa-101 kernel: RSP <ffff882038303da0> Feb 25 20:57:51 kaa-101 kernel: ---[ end trace c037e0a2aa8fe44c ]---
The only lockref_get_not_dead call is here:
rcu_read_lock(); spin_lock(&parent->d_lock); /* check i_size again here, because empty directory can be * marked as complete while not holding the i_mutex. */ if (ceph_dir_is_complete_ordered(dir) && ptr_pos < i_size_read(dir)) dentry = cache_ctl.dentries[cache_ctl.index % nsize]; else dentry = NULL; spin_unlock(&parent->d_lock); if (dentry && !lockref_get_not_dead(&dentry->d_lockref)) dentry = NULL; rcu_read_unlock();
I will again fallback to noasyncreaddir.
Updated by Zheng Yan about 8 years ago
- File readdir-cache.patch readdir-cache.patch added
- Status changed from New to 12
I found an bug in the fill cache code, could you please try the attached patch
Updated by Markus Blank-Burian about 8 years ago
Thanks for the patch! I will let you know, if the bug hits again.