Bug #45649
closeduse-after-free during AIO
0%
Description
[ 2768.702419] ==================================================================
[ 2768.704404] BUG: KASAN: use-after-free in ceph_read_iter+0x1986/0x1eb0 [ceph]
[ 2768.706013] Read of size 4 at addr ffff888104e05d20 by task fsstress/29043
[ 2768.707968] CPU: 3 PID: 29043 Comm: fsstress Tainted: G E --------- - - 4.18.0+ #1
[ 2768.709959] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[ 2768.712016] Call Trace:
[ 2768.712597] dump_stack+0x9a/0xf0
[ 2768.713412] print_address_description.cold.3+0x9/0x23b
[ 2768.714613] kasan_report.cold.4+0x64/0x95
[ 2768.715525] ? ceph_read_iter+0x1986/0x1eb0 [ceph]
[ 2768.716704] ceph_read_iter+0x1986/0x1eb0 [ceph]
[ 2768.717812] ? kvm_sched_clock_read+0x14/0x30
[ 2768.718825] ? sched_clock+0x5/0x10
[ 2768.719701] ? ceph_write_iter+0x1c80/0x1c80 [ceph]
[ 2768.720845] ? sched_clock+0x5/0x10
[ 2768.721618] ? find_held_lock+0x3a/0x1c0
[ 2768.722596] ? __save_stack_trace.constprop.3+0x80/0x100
[ 2768.723883] ? lock_downgrade+0x6f0/0x6f0
[ 2768.724906] ? rcu_read_lock_held+0xaf/0xc0
[ 2768.726008] ? fsnotify_nameremove+0x240/0x240
[ 2768.727046] ? fsnotify_first_mark+0x150/0x150
[ 2768.728104] ? aio_read+0x20f/0x340
[ 2768.728899] aio_read+0x20f/0x340
[ 2768.729574] ? aio_write+0x530/0x530
[ 2768.730338] ? sched_clock_cpu+0x18/0x1e0
[ 2768.731959] ? kvm_sched_clock_read+0x14/0x30
[ 2768.733692] ? lock_downgrade+0x6f0/0x6f0
[ 2768.735306] ? lock_acquire+0x14f/0x3b0
[ 2768.736877] ? __might_fault+0xc4/0x1a0
[ 2768.738609] io_submit_one+0x856/0xb50
[ 2768.740593] ? aio_read+0x340/0x340
[ 2768.742607] __x64_sys_io_submit+0x17f/0x420
[ 2768.745181] ? aio_fsync_work+0xd0/0xd0
[ 2768.747549] ? retint_user+0x18/0x18
[ 2768.749790] ? do_syscall_64+0xa5/0x4d0
[ 2768.752295] do_syscall_64+0xa5/0x4d0
[ 2768.754251] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[ 2768.756457] RIP: 0033:0x7ffff7ebbe0d
[ 2768.758342] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b 90 0c 00 f7 d8 64 89 01 48
[ 2768.765492] RSP: 002b:00007fffffffa228 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[ 2768.768801] RAX: ffffffffffffffda RBX: 00007ffff7dc3b00 RCX: 00007ffff7ebbe0d
[ 2768.771933] RDX: 00007fffffffa278 RSI: 0000000000000001 RDI: 00007ffff7fca000
[ 2768.774708] RBP: 00007ffff7fca000 R08: 0000000000000000 R09: 0000000000000000
[ 2768.777558] R10: 00007ffff7800000 R11: 0000000000000246 R12: 0000000000000001
[ 2768.780099] R13: 0000000000000019 R14: 00007fffffffa278 R15: 00007ffff7800000
[ 2768.783440] Allocated by task 29043:
[ 2768.785226] kasan_kmalloc+0xbf/0xe0
[ 2768.786808] kmem_cache_alloc+0x10c/0x340
[ 2768.788338] io_submit_one+0xd3/0xb50
[ 2768.790086] __x64_sys_io_submit+0x17f/0x420
[ 2768.791912] do_syscall_64+0xa5/0x4d0
[ 2768.793585] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[ 2768.795251] 0xffffffffffffffff
[ 2768.797803] Freed by task 29028:
[ 2768.799093] __kasan_slab_free+0x125/0x170
[ 2768.800592] slab_free_freelist_hook+0x5e/0x140
[ 2768.802321] kmem_cache_free+0x9d/0x300
[ 2768.803906] aio_complete+0x56a/0xb90
[ 2768.805441] ceph_aio_complete_req+0xa8a/0x10e0 [ceph]
[ 2768.807846] __complete_request+0x49/0x140 [libceph]
[ 2768.810262] handle_reply+0x147d/0x2400 [libceph]
[ 2768.812559] dispatch+0x468/0x24d0 [libceph]
[ 2768.814747] try_read+0xcaf/0x1fd0 [libceph]
[ 2768.816936] ceph_con_workfn+0x1db/0x11e0 [libceph]
[ 2768.819394] process_one_work+0x8f0/0x17a0
[ 2768.821502] worker_thread+0x87/0xb50
[ 2768.823684] kthread+0x30c/0x3d0
[ 2768.825757] ret_from_fork+0x27/0x50
[ 2768.827800] 0xffffffffffffffff
[ 2768.831200] The buggy address belongs to the object at ffff888104e05d00
which belongs to the cache aio_kiocb of size 184
[ 2768.836722] The buggy address is located 32 bytes inside of
184-byte region [ffff888104e05d00, ffff888104e05db8)
[ 2768.841608] The buggy address belongs to the page:
[ 2768.843896] page:ffffea0004138140 refcount:1 mapcount:0 mapping:ffff88810b9a1b00 index:0xffff888104e05900
[ 2768.847031] flags: 0x17ffffc0000100(slab)
[ 2768.849134] raw: 0017ffffc0000100 dead000000000100 dead000000000200 ffff88810b9a1b00
[ 2768.851978] raw: ffff888104e05900 0000000080100009 00000001ffffffff 0000000000000000
[ 2768.854933] page dumped because: kasan: bad access detected
[ 2768.858658] Memory state around the buggy address:
[ 2768.860826] ffff888104e05c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2768.863427] ffff888104e05c80: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
[ 2768.865934] >ffff888104e05d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2768.868430] ^
[ 2768.870286] ffff888104e05d80: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
[ 2768.872581] ffff888104e05e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2768.875054] ==================================================================
[ 2768.877544] Disabling lock debugging due to kernel taint
I suspect the use-after-free is at
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 7243008bed72..9e0a0fc17a53 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1364,6 +1364,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, if (ret != -EOLDSNAPC && pos > iocb->ki_pos) { ret = pos - iocb->ki_pos; iocb->ki_pos = pos; + ^^^ } return ret; } <pre>
Updated by Xiubo Li about 4 years ago
- Assignee set to Xiubo Li
Talked with Zheng Yan, I will take it.
Thanks.
Updated by Xiubo Li about 4 years ago
Checked it, you are using the 4.18 kernel on f30, and for 4.18 kernel it is buggy for the fs/aio.c:
Just after the ceph kclient thread submits the aio read/write requests it is scheduled out or stuck. And the ceph-mds could handle and reply the aio requests very fast enough, and in the
handle_reply() --> complete_request() --> req->r_callback = ceph_aio_complete_req() --> ceph_aio_complete() --> aio_req->iocb->ki_complete() --> aio_complete_rw() --> aio_complete() --> kmem_cache_free(kiocb_cachep, iocb)
The 'iocb' will be freed.
And then the ceph kclient thread sheduled to run, it will continue to use the 'iocb', then we will hit the use-after-free bug.
This has been fixed by:
commit 9018ccc453af063d16b3b6b5dfa2ad0635390371
Author: Christoph Hellwig <hch@lst.de>
Date: Tue Jul 24 11:36:37 2018 +0200
aio: add a iocb refcount
This is needed to prevent races caused by the way the ->poll API works.
To avoid introducing overhead for other users of the iocbs we initialize
it to zero and only do refcount operations if it is non-zero in the
completion path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Avi Kivity <avi@scylladb.com>
Updated by Xiubo Li about 4 years ago
Checked the downstream rhel7/8 kernel, I think we should downstream it.
Updated by Xiubo Li about 4 years ago
- Status changed from In Progress to Resolved
Updated by Xiubo Li almost 4 years ago
With the KASAN enabled, and with a small change in the fs/ceph/file.c code:
Because for aio_read the ki_flags will be useless any more since here, I just use the it to test, and at the same time to set the ki_pos to 0 always:
@@ -1579,6 +1585,11 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to)
ceph_end_io_direct(inode);
else
ceph_end_io_read(inode);
+ while (i++ < 3000) {
+ iocb->ki_flags = 0;
+ iocb->ki_pos = 0;
+ }
+ printk("%s after sleep 3s ------kiocb:%p, %lld\n", __func__, iocb, iocb->ki_pos);
if (retry_op > HAVE_RETRIED && ret >= 0) {
int statret;
I can hit the same call traces:
<3>[ 218.834353] ================================================================== <3>[ 218.834938] BUG: KASAN: use-after-free in ceph_read_iter+0x68e/0x1480 [ceph] <3>[ 218.835177] Write of size 4 at addr ffff888c33d98220 by task fio/11510 <3>[ 218.835397] <4>[ 218.835470] CPU: 4 PID: 11510 Comm: fio Not tainted 4.18.0+ #3 <4>[ 218.835475] Hardware name: Red Hat RHEV Hypervisor, BIOS 1.11.0-2.el7 04/01/2014 <4>[ 218.835492] Call Trace: <4>[ 218.835606] dump_stack+0x71/0xa0 <4>[ 218.835634] print_address_description+0x65/0x22e <4>[ 218.835696] ? ceph_read_iter+0x68e/0x1480 [ceph] <4>[ 218.835707] kasan_report.cold.6+0x241/0x306 <4>[ 218.835767] ceph_read_iter+0x68e/0x1480 [ceph] <4>[ 218.835800] ? __read_once_size_nocheck.constprop.8+0x10/0x10 <4>[ 218.835808] ? __orc_find+0x6f/0xc0 <4>[ 218.835814] ? stack_access_ok+0x35/0x80 <4>[ 218.835934] ? ceph_direct_read_write+0x710/0x710 [ceph] <4>[ 218.835967] ? is_bpf_text_address+0xa/0x20 <4>[ 218.835993] ? kernel_text_address+0x100/0x110 <4>[ 218.836000] ? __kernel_text_address+0xe/0x30 <4>[ 218.836007] ? unwind_get_return_address+0x2f/0x50 <4>[ 218.836033] ? __save_stack_trace.constprop.8+0x80/0x100 <4>[ 218.836061] ? __fsnotify_parent+0x8d/0x1f0 <4>[ 218.836068] ? fsnotify+0x7ee/0x810 <4>[ 218.836081] ? __fsnotify_inode_delete+0x20/0x20 <4>[ 218.836099] ? aio_read+0x173/0x250 <4>[ 218.836158] ? ceph_direct_read_write+0x710/0x710 [ceph] <4>[ 218.836164] aio_read+0x173/0x250 <4>[ 218.836173] ? aio_write+0x310/0x310 <4>[ 218.836198] ? down_read_trylock+0xe4/0x140 <4>[ 218.836214] ? __count_memcg_events.part.58+0x2a/0x100 <4>[ 218.836223] ? up_read+0x14/0x80 <4>[ 218.836242] ? __do_page_fault+0x3a4/0x620 <4>[ 218.836261] ? kasan_unpoison_shadow+0x30/0x40 <4>[ 218.836268] ? kasan_kmalloc+0xbf/0xe0 <4>[ 218.836275] ? io_submit_one+0xc6/0x580 <4>[ 218.836283] io_submit_one+0x302/0x580 <4>[ 218.836290] ? lookup_ioctx+0x72/0x1b0 <4>[ 218.836296] ? aio_read+0x250/0x250 <4>[ 218.836303] ? aio_fsync+0x160/0x160 <4>[ 218.836321] ? ktime_get_coarse_real_ts64+0x4a/0x70 <4>[ 218.836330] __x64_sys_io_submit+0x114/0x240 <4>[ 218.836338] ? __ia32_compat_sys_io_submit+0x240/0x240 <4>[ 218.836357] do_syscall_64+0x73/0x200 <4>[ 218.836374] entry_SYSCALL_64_after_hwframe+0x65/0xca <4>[ 218.836391] RIP: 0033:0x7fcb4664a697 <4>[ 218.836401] Code: 49 83 38 00 75 eb 49 83 78 08 00 75 e4 8b 47 0c 39 47 08 75 dc 31 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 90 b8 d1 00 00 00 0f 05 <c3> 0f 1f 84 00 00 00 00 00 b8 d2 00 00 00 0f 05 c3 0f 1f 84 00 00 <4>[ 218.836405] RSP: 002b:00007fff0c15f7c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000d1 <4>[ 218.836413] RAX: ffffffffffffffda RBX: 000000000156b130 RCX: 00007fcb4664a697 <4>[ 218.836417] RDX: 000000000156ace0 RSI: 0000000000000001 RDI: 00007fcb5a759000 <4>[ 218.836421] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000156d960 <4>[ 218.836425] R10: 00007fff0c15eba0 R11: 0000000000000202 R12: 0000000000000001 <4>[ 218.836429] R13: 0000000000000000 R14: 00007fcb1a904d40 R15: 000000000156a7a0 <3>[ 218.836435] <3>[ 218.836507] Allocated by task 11510: <4>[ 218.836638] kasan_kmalloc+0xbf/0xe0 <4>[ 218.836644] kmem_cache_alloc+0xb5/0x1d0 <4>[ 218.836650] io_submit_one+0xc6/0x580 <4>[ 218.836656] __x64_sys_io_submit+0x114/0x240 <4>[ 218.836661] do_syscall_64+0x73/0x200 <4>[ 218.836667] entry_SYSCALL_64_after_hwframe+0x65/0xca <4>[ 218.836671] 0xffffffffffffffff <3>[ 218.836672] <3>[ 218.836731] Freed by task 301: <4>[ 218.836856] __kasan_slab_free+0x125/0x170 <4>[ 218.836862] kmem_cache_free+0x78/0x1f0 <4>[ 218.836868] aio_complete+0x2d9/0x3e0 <4>[ 218.836975] ceph_aio_complete_req+0x3fb/0x710 [ceph] <4>[ 218.837132] __complete_request+0x36/0xb0 [libceph] <4>[ 218.837201] handle_reply+0xe07/0x1050 [libceph] <4>[ 218.837265] dispatch+0x276/0x1140 [libceph] <4>[ 218.837323] try_read+0x825/0x1050 [libceph] <4>[ 218.837381] ceph_con_workfn+0x189/0x860 [libceph] <4>[ 218.837390] process_one_work+0x382/0x670 <4>[ 218.837395] worker_thread+0x57/0x5b0 <4>[ 218.837401] kthread+0x1a0/0x1c0 <4>[ 218.837406] ret_from_fork+0x35/0x40 <4>[ 218.837410] 0xffffffffffffffff <3>[ 218.837412] <3>[ 218.837474] The buggy address belongs to the object at ffff888c33d98200 <3>[ 218.837474] which belongs to the cache aio_kiocb of size 136 <3>[ 218.837885] The buggy address is located 32 bytes inside of <3>[ 218.837885] 136-byte region [ffff888c33d98200, ffff888c33d98288) <3>[ 218.838439] The buggy address belongs to the page: <4>[ 218.838629] page:ffffea0030cf6600 refcount:1 mapcount:0 mapping:ffff888d567c0000 index:0x0 compound_mapcount: 0 <4>[ 218.838638] flags: 0x17ffffc0008100(slab|head) <4>[ 218.838648] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff888d567c0000 <4>[ 218.838655] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 <4>[ 218.838658] page dumped because: kasan: bad access detected <3>[ 218.838660] <3>[ 218.838717] Memory state around the buggy address: <3>[ 218.838897] ffff888c33d98100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb <3>[ 218.839229] ffff888c33d98180: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc <3>[ 218.839467] >ffff888c33d98200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb <3>[ 218.839701] ^ <3>[ 218.839865] ffff888c33d98280: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc <3>[ 218.840170] ffff888c33d98300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb <3>[ 218.840472] ================================================================== 3641 0000000000002ce0 <ceph_read_iter>: 3642 { 3643 2ce0: e8 00 00 00 00 callq 2ce5 <ceph_read_iter+0x5> 3644 2ce5: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 3645 2cea: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 3646 2cee: 41 ff 72 f8 pushq -0x8(%r10) ... 4059 while (i++ < 3000) { 4060 3366: 83 c3 01 add $0x1,%ebx 4061 iocb->ki_flags = 0; 4062 3369: e8 00 00 00 00 callq 336e <ceph_read_iter+0x68e> 4063 336e: 41 c7 45 20 00 00 00 movl $0x0,0x20(%r13) 4064 3375: 00 4065 iocb->ki_pos = 0; 4066 3376: 4c 89 ff mov %r15,%rdi 4067 3379: e8 00 00 00 00 callq 337e <ceph_read_iter+0x69e> 4068 337e: 49 c7 45 08 00 00 00 movq $0x0,0x8(%r13) 4069 3385: 00
From the call trace stack, it crashed in the Line#4063, and the offset is 0x336e(0x2ce0 + 0x68e)
I can 100% reproduce it with above.
Updated by Xiubo Li almost 4 years ago
We also need this commit:
commit b53119f13a04879c3bf502828d99d13726639ead Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Mar 6 20:22:54 2019 -0500 pin iocb through aio. aio_poll() is not the only case that needs file pinned; worse, while aio_read()/aio_write() can live without pinning iocb itself, the proof is rather brittle and can easily break on later changes. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>