Bug #45649: use-after-free during AIO - Linux kernel client - Ceph

Actions

Copy link

Bug #45649

closed

use-after-free during AIO

Added by Zheng Yan about 4 years ago. Updated almost 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Xiubo Li

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

[ 2768.702419] ==================================================================
[ 2768.704404] BUG: KASAN: use-after-free in ceph_read_iter+0x1986/0x1eb0 [ceph]
[ 2768.706013] Read of size 4 at addr ffff888104e05d20 by task fsstress/29043

[ 2768.707968] CPU: 3 PID: 29043 Comm: fsstress Tainted: G E --------- - - 4.18.0+ #1
[ 2768.709959] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[ 2768.712016] Call Trace:
[ 2768.712597] dump_stack+0x9a/0xf0
[ 2768.713412] print_address_description.cold.3+0x9/0x23b
[ 2768.714613] kasan_report.cold.4+0x64/0x95
[ 2768.715525] ? ceph_read_iter+0x1986/0x1eb0 [ceph]
[ 2768.716704] ceph_read_iter+0x1986/0x1eb0 [ceph]
[ 2768.717812] ? kvm_sched_clock_read+0x14/0x30
[ 2768.718825] ? sched_clock+0x5/0x10
[ 2768.719701] ? ceph_write_iter+0x1c80/0x1c80 [ceph]
[ 2768.720845] ? sched_clock+0x5/0x10
[ 2768.721618] ? find_held_lock+0x3a/0x1c0
[ 2768.722596] ? __save_stack_trace.constprop.3+0x80/0x100
[ 2768.723883] ? lock_downgrade+0x6f0/0x6f0
[ 2768.724906] ? rcu_read_lock_held+0xaf/0xc0
[ 2768.726008] ? fsnotify_nameremove+0x240/0x240
[ 2768.727046] ? fsnotify_first_mark+0x150/0x150
[ 2768.728104] ? aio_read+0x20f/0x340
[ 2768.728899] aio_read+0x20f/0x340
[ 2768.729574] ? aio_write+0x530/0x530
[ 2768.730338] ? sched_clock_cpu+0x18/0x1e0
[ 2768.731959] ? kvm_sched_clock_read+0x14/0x30
[ 2768.733692] ? lock_downgrade+0x6f0/0x6f0
[ 2768.735306] ? lock_acquire+0x14f/0x3b0
[ 2768.736877] ? __might_fault+0xc4/0x1a0
[ 2768.738609] io_submit_one+0x856/0xb50
[ 2768.740593] ? aio_read+0x340/0x340
[ 2768.742607] __x64_sys_io_submit+0x17f/0x420
[ 2768.745181] ? aio_fsync_work+0xd0/0xd0
[ 2768.747549] ? retint_user+0x18/0x18
[ 2768.749790] ? do_syscall_64+0xa5/0x4d0
[ 2768.752295] do_syscall_64+0xa5/0x4d0
[ 2768.754251] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[ 2768.756457] RIP: 0033:0x7ffff7ebbe0d
[ 2768.758342] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b 90 0c 00 f7 d8 64 89 01 48
[ 2768.765492] RSP: 002b:00007fffffffa228 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[ 2768.768801] RAX: ffffffffffffffda RBX: 00007ffff7dc3b00 RCX: 00007ffff7ebbe0d
[ 2768.771933] RDX: 00007fffffffa278 RSI: 0000000000000001 RDI: 00007ffff7fca000
[ 2768.774708] RBP: 00007ffff7fca000 R08: 0000000000000000 R09: 0000000000000000
[ 2768.777558] R10: 00007ffff7800000 R11: 0000000000000246 R12: 0000000000000001
[ 2768.780099] R13: 0000000000000019 R14: 00007fffffffa278 R15: 00007ffff7800000

[ 2768.783440] Allocated by task 29043:
[ 2768.785226] kasan_kmalloc+0xbf/0xe0
[ 2768.786808] kmem_cache_alloc+0x10c/0x340
[ 2768.788338] io_submit_one+0xd3/0xb50
[ 2768.790086] __x64_sys_io_submit+0x17f/0x420
[ 2768.791912] do_syscall_64+0xa5/0x4d0
[ 2768.793585] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[ 2768.795251] 0xffffffffffffffff

[ 2768.797803] Freed by task 29028:
[ 2768.799093] __kasan_slab_free+0x125/0x170
[ 2768.800592] slab_free_freelist_hook+0x5e/0x140
[ 2768.802321] kmem_cache_free+0x9d/0x300
[ 2768.803906] aio_complete+0x56a/0xb90
[ 2768.805441] ceph_aio_complete_req+0xa8a/0x10e0 [ceph]
[ 2768.807846] __complete_request+0x49/0x140 [libceph]
[ 2768.810262] handle_reply+0x147d/0x2400 [libceph]
[ 2768.812559] dispatch+0x468/0x24d0 [libceph]
[ 2768.814747] try_read+0xcaf/0x1fd0 [libceph]
[ 2768.816936] ceph_con_workfn+0x1db/0x11e0 [libceph]
[ 2768.819394] process_one_work+0x8f0/0x17a0
[ 2768.821502] worker_thread+0x87/0xb50
[ 2768.823684] kthread+0x30c/0x3d0
[ 2768.825757] ret_from_fork+0x27/0x50
[ 2768.827800] 0xffffffffffffffff

[ 2768.831200] The buggy address belongs to the object at ffff888104e05d00
which belongs to the cache aio_kiocb of size 184
[ 2768.836722] The buggy address is located 32 bytes inside of
184-byte region [ffff888104e05d00, ffff888104e05db8)
[ 2768.841608] The buggy address belongs to the page:
[ 2768.843896] page:ffffea0004138140 refcount:1 mapcount:0 mapping:ffff88810b9a1b00 index:0xffff888104e05900
[ 2768.847031] flags: 0x17ffffc0000100(slab)
[ 2768.849134] raw: 0017ffffc0000100 dead000000000100 dead000000000200 ffff88810b9a1b00
[ 2768.851978] raw: ffff888104e05900 0000000080100009 00000001ffffffff 0000000000000000
[ 2768.854933] page dumped because: kasan: bad access detected

[ 2768.858658] Memory state around the buggy address:
[ 2768.860826] ffff888104e05c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2768.863427] ffff888104e05c80: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
[ 2768.865934] >ffff888104e05d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2768.868430] ^
[ 2768.870286] ffff888104e05d80: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
[ 2768.872581] ffff888104e05e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2768.875054] ==================================================================
[ 2768.877544] Disabling lock debugging due to kernel taint

I suspect the use-after-free is at

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 7243008bed72..9e0a0fc17a53 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1364,6 +1364,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
        if (ret != -EOLDSNAPC && pos > iocb->ki_pos) {
                ret = pos - iocb->ki_pos;
                iocb->ki_pos = pos;
+               ^^^
        }
        return ret;
 }
<pre>

Actions

Copy link

Updated by Xiubo Li about 4 years ago

Assignee set to Xiubo Li

Talked with Zheng Yan, I will take it.
Thanks.

Actions

Copy link

Updated by Xiubo Li about 4 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Xiubo Li about 4 years ago

Checked it, you are using the 4.18 kernel on f30, and for 4.18 kernel it is buggy for the fs/aio.c:

Just after the ceph kclient thread submits the aio read/write requests it is scheduled out or stuck. And the ceph-mds could handle and reply the aio requests very fast enough, and in the

handle_reply() --> complete_request() --> req->r_callback = ceph_aio_complete_req() --> ceph_aio_complete() --> aio_req->iocb->ki_complete() --> aio_complete_rw() --> aio_complete() --> kmem_cache_free(kiocb_cachep, iocb)

The 'iocb' will be freed.

And then the ceph kclient thread sheduled to run, it will continue to use the 'iocb', then we will hit the use-after-free bug.

This has been fixed by:


commit 9018ccc453af063d16b3b6b5dfa2ad0635390371
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Jul 24 11:36:37 2018 +0200

    aio: add a iocb refcount

    This is needed to prevent races caused by the way the ->poll API works.
    To avoid introducing overhead for other users of the iocbs we initialize
    it to zero and only do refcount operations if it is non-zero in the
    completion path.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Avi Kivity <avi@scylladb.com>

Actions

Copy link

Updated by Xiubo Li about 4 years ago

Checked the downstream rhel7/8 kernel, I think we should downstream it.

Actions

Copy link

Updated by Xiubo Li about 4 years ago

Status changed from In Progress to Resolved

Actions

Copy link

Updated by Xiubo Li almost 4 years ago

With the KASAN enabled, and with a small change in the fs/ceph/file.c code:

Because for aio_read the ki_flags will be useless any more since here, I just use the it to test, and at the same time to set the ki_pos to 0 always:

@@ -1579,6 +1585,11 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to)
                ceph_end_io_direct(inode);
        else
                ceph_end_io_read(inode);
+       while (i++ < 3000) {
+               iocb->ki_flags = 0;
+               iocb->ki_pos = 0;
+       }
+       printk("%s after sleep 3s ------kiocb:%p, %lld\n", __func__, iocb, iocb->ki_pos);

        if (retry_op > HAVE_RETRIED && ret >= 0) {
                int statret;

I can hit the same call traces:

<3>[  218.834353] ==================================================================
<3>[  218.834938] BUG: KASAN: use-after-free in ceph_read_iter+0x68e/0x1480 [ceph]
<3>[  218.835177] Write of size 4 at addr ffff888c33d98220 by task fio/11510
<3>[  218.835397] 
<4>[  218.835470] CPU: 4 PID: 11510 Comm: fio Not tainted 4.18.0+ #3
<4>[  218.835475] Hardware name: Red Hat RHEV Hypervisor, BIOS 1.11.0-2.el7 04/01/2014
<4>[  218.835492] Call Trace:
<4>[  218.835606]  dump_stack+0x71/0xa0
<4>[  218.835634]  print_address_description+0x65/0x22e
<4>[  218.835696]  ? ceph_read_iter+0x68e/0x1480 [ceph]
<4>[  218.835707]  kasan_report.cold.6+0x241/0x306
<4>[  218.835767]  ceph_read_iter+0x68e/0x1480 [ceph]
<4>[  218.835800]  ? __read_once_size_nocheck.constprop.8+0x10/0x10
<4>[  218.835808]  ? __orc_find+0x6f/0xc0
<4>[  218.835814]  ? stack_access_ok+0x35/0x80
<4>[  218.835934]  ? ceph_direct_read_write+0x710/0x710 [ceph]
<4>[  218.835967]  ? is_bpf_text_address+0xa/0x20
<4>[  218.835993]  ? kernel_text_address+0x100/0x110
<4>[  218.836000]  ? __kernel_text_address+0xe/0x30
<4>[  218.836007]  ? unwind_get_return_address+0x2f/0x50
<4>[  218.836033]  ? __save_stack_trace.constprop.8+0x80/0x100
<4>[  218.836061]  ? __fsnotify_parent+0x8d/0x1f0
<4>[  218.836068]  ? fsnotify+0x7ee/0x810
<4>[  218.836081]  ? __fsnotify_inode_delete+0x20/0x20
<4>[  218.836099]  ? aio_read+0x173/0x250
<4>[  218.836158]  ? ceph_direct_read_write+0x710/0x710 [ceph]
<4>[  218.836164]  aio_read+0x173/0x250
<4>[  218.836173]  ? aio_write+0x310/0x310
<4>[  218.836198]  ? down_read_trylock+0xe4/0x140
<4>[  218.836214]  ? __count_memcg_events.part.58+0x2a/0x100
<4>[  218.836223]  ? up_read+0x14/0x80
<4>[  218.836242]  ? __do_page_fault+0x3a4/0x620
<4>[  218.836261]  ? kasan_unpoison_shadow+0x30/0x40
<4>[  218.836268]  ? kasan_kmalloc+0xbf/0xe0
<4>[  218.836275]  ? io_submit_one+0xc6/0x580
<4>[  218.836283]  io_submit_one+0x302/0x580
<4>[  218.836290]  ? lookup_ioctx+0x72/0x1b0
<4>[  218.836296]  ? aio_read+0x250/0x250
<4>[  218.836303]  ? aio_fsync+0x160/0x160
<4>[  218.836321]  ? ktime_get_coarse_real_ts64+0x4a/0x70
<4>[  218.836330]  __x64_sys_io_submit+0x114/0x240
<4>[  218.836338]  ? __ia32_compat_sys_io_submit+0x240/0x240
<4>[  218.836357]  do_syscall_64+0x73/0x200
<4>[  218.836374]  entry_SYSCALL_64_after_hwframe+0x65/0xca
<4>[  218.836391] RIP: 0033:0x7fcb4664a697
<4>[  218.836401] Code: 49 83 38 00 75 eb 49 83 78 08 00 75 e4 8b 47 0c 39 47 08 75 dc 31 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 90 b8 d1 00 00 00 0f 05 <c3> 0f 1f 84 00 00 00 00 00 b8 d2 00 00 00 0f 05 c3 0f 1f 84 00 00
<4>[  218.836405] RSP: 002b:00007fff0c15f7c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000d1
<4>[  218.836413] RAX: ffffffffffffffda RBX: 000000000156b130 RCX: 00007fcb4664a697
<4>[  218.836417] RDX: 000000000156ace0 RSI: 0000000000000001 RDI: 00007fcb5a759000
<4>[  218.836421] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000156d960
<4>[  218.836425] R10: 00007fff0c15eba0 R11: 0000000000000202 R12: 0000000000000001
<4>[  218.836429] R13: 0000000000000000 R14: 00007fcb1a904d40 R15: 000000000156a7a0
<3>[  218.836435] 
<3>[  218.836507] Allocated by task 11510:
<4>[  218.836638]  kasan_kmalloc+0xbf/0xe0
<4>[  218.836644]  kmem_cache_alloc+0xb5/0x1d0
<4>[  218.836650]  io_submit_one+0xc6/0x580
<4>[  218.836656]  __x64_sys_io_submit+0x114/0x240
<4>[  218.836661]  do_syscall_64+0x73/0x200
<4>[  218.836667]  entry_SYSCALL_64_after_hwframe+0x65/0xca
<4>[  218.836671]  0xffffffffffffffff
<3>[  218.836672] 
<3>[  218.836731] Freed by task 301:
<4>[  218.836856]  __kasan_slab_free+0x125/0x170
<4>[  218.836862]  kmem_cache_free+0x78/0x1f0
<4>[  218.836868]  aio_complete+0x2d9/0x3e0
<4>[  218.836975]  ceph_aio_complete_req+0x3fb/0x710 [ceph]
<4>[  218.837132]  __complete_request+0x36/0xb0 [libceph]
<4>[  218.837201]  handle_reply+0xe07/0x1050 [libceph]
<4>[  218.837265]  dispatch+0x276/0x1140 [libceph]
<4>[  218.837323]  try_read+0x825/0x1050 [libceph]
<4>[  218.837381]  ceph_con_workfn+0x189/0x860 [libceph]
<4>[  218.837390]  process_one_work+0x382/0x670
<4>[  218.837395]  worker_thread+0x57/0x5b0
<4>[  218.837401]  kthread+0x1a0/0x1c0
<4>[  218.837406]  ret_from_fork+0x35/0x40
<4>[  218.837410]  0xffffffffffffffff
<3>[  218.837412] 
<3>[  218.837474] The buggy address belongs to the object at ffff888c33d98200
<3>[  218.837474]  which belongs to the cache aio_kiocb of size 136
<3>[  218.837885] The buggy address is located 32 bytes inside of
<3>[  218.837885]  136-byte region [ffff888c33d98200, ffff888c33d98288)
<3>[  218.838439] The buggy address belongs to the page:
<4>[  218.838629] page:ffffea0030cf6600 refcount:1 mapcount:0 mapping:ffff888d567c0000 index:0x0 compound_mapcount: 0
<4>[  218.838638] flags: 0x17ffffc0008100(slab|head)
<4>[  218.838648] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff888d567c0000
<4>[  218.838655] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
<4>[  218.838658] page dumped because: kasan: bad access detected
<3>[  218.838660] 
<3>[  218.838717] Memory state around the buggy address:
<3>[  218.838897]  ffff888c33d98100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
<3>[  218.839229]  ffff888c33d98180: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
<3>[  218.839467] >ffff888c33d98200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
<3>[  218.839701]                                ^
<3>[  218.839865]  ffff888c33d98280: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
<3>[  218.840170]  ffff888c33d98300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
<3>[  218.840472] ==================================================================

 3641 0000000000002ce0 <ceph_read_iter>:                                                                                                  
 3642 {
 3643     2ce0:       e8 00 00 00 00          callq  2ce5 <ceph_read_iter+0x5>
 3644     2ce5:       4c 8d 54 24 08          lea    0x8(%rsp),%r10
 3645     2cea:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
 3646     2cee:       41 ff 72 f8             pushq  -0x8(%r10)
 ...
 4059         while (i++ < 3000) {
 4060     3366:       83 c3 01                add    $0x1,%ebx
 4061                 iocb->ki_flags = 0;
 4062     3369:       e8 00 00 00 00          callq  336e <ceph_read_iter+0x68e>
 4063     336e:       41 c7 45 20 00 00 00    movl   $0x0,0x20(%r13)
 4064     3375:       00 
 4065                 iocb->ki_pos = 0;
 4066     3376:       4c 89 ff                mov    %r15,%rdi
 4067     3379:       e8 00 00 00 00          callq  337e <ceph_read_iter+0x69e>
 4068     337e:       49 c7 45 08 00 00 00    movq   $0x0,0x8(%r13)
 4069     3385:       00

From the call trace stack, it crashed in the Line#4063, and the offset is 0x336e(0x2ce0 + 0x68e)

I can 100% reproduce it with above.

Actions

Copy link

Updated by Xiubo Li almost 4 years ago

We also need this commit:


commit b53119f13a04879c3bf502828d99d13726639ead
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Mar 6 20:22:54 2019 -0500

    pin iocb through aio.

    aio_poll() is not the only case that needs file pinned; worse, while
    aio_read()/aio_write() can live without pinning iocb itself, the
    proof is rather brittle and can easily break on later changes.

    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #45649

use-after-free during AIO

Updated by Xiubo Li about 4 years ago

Updated by Xiubo Li about 4 years ago

Updated by Xiubo Li about 4 years ago

Updated by Xiubo Li about 4 years ago

Updated by Xiubo Li about 4 years ago

Updated by Xiubo Li almost 4 years ago

Updated by Xiubo Li almost 4 years ago