Bug #51158
KASAN use after free when destroying inode at umount
0%
Description
KASAN occasionally pops this warning when when unmounting:
Jun 02 11:40:04 client1 kernel: ================================================================== Jun 02 11:40:04 client1 kernel: BUG: KASAN: use-after-free in __destroy_inode+0x59/0x370 Jun 02 11:40:04 client1 kernel: Read of size 8 at addr ffff8881036b8060 by task kworker/u32:2/82065 Jun 02 11:40:04 client1 kernel: Jun 02 11:40:04 client1 kernel: CPU: 2 PID: 82065 Comm: kworker/u32:2 Tainted: G W OE T 5.13.0-rc2+ #131 Jun 02 11:40:04 client1 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2.fc34 04/01/2014 Jun 02 11:40:04 client1 kernel: Workqueue: ceph-inode ceph_inode_work [ceph] Jun 02 11:40:04 client1 kernel: Call Trace: Jun 02 11:40:04 client1 kernel: dump_stack+0xa5/0xdc Jun 02 11:40:04 client1 kernel: print_address_description.constprop.0+0x18/0x160 Jun 02 11:40:04 client1 kernel: ? __destroy_inode+0x59/0x370 Jun 02 11:40:04 client1 kernel: kasan_report.cold+0x7f/0x111 Jun 02 11:40:04 client1 kernel: ? __destroy_inode+0x59/0x370 Jun 02 11:40:04 client1 kernel: __destroy_inode+0x59/0x370 Jun 02 11:40:04 client1 kernel: destroy_inode+0x55/0xd0 Jun 02 11:40:04 client1 kernel: process_one_work+0x524/0x9b0 Jun 02 11:40:04 client1 kernel: ? pwq_dec_nr_in_flight+0x110/0x110 Jun 02 11:40:04 client1 kernel: ? lock_acquired+0x301/0x560 Jun 02 11:40:04 client1 kernel: worker_thread+0x2f0/0x6f0 Jun 02 11:40:04 client1 kernel: ? process_one_work+0x9b0/0x9b0 Jun 02 11:40:04 client1 kernel: kthread+0x1fb/0x220 Jun 02 11:40:04 client1 kernel: ? __kthread_bind_mask+0x70/0x70 Jun 02 11:40:04 client1 kernel: ret_from_fork+0x22/0x30 Jun 02 11:40:04 client1 kernel: Jun 02 11:40:04 client1 kernel: Allocated by task 115001: Jun 02 11:40:04 client1 kernel: kasan_save_stack+0x1b/0x40 Jun 02 11:40:04 client1 kernel: __kasan_kmalloc+0x7c/0x90 Jun 02 11:40:04 client1 kernel: bdi_alloc+0x2f/0x90 Jun 02 11:40:04 client1 kernel: super_setup_bdi_name+0x85/0x140 Jun 02 11:40:04 client1 kernel: ceph_get_tree+0x526/0xc00 [ceph] Jun 02 11:40:04 client1 kernel: vfs_get_tree+0x4c/0x140 Jun 02 11:40:04 client1 kernel: path_mount+0x58e/0xf80 Jun 02 11:40:04 client1 kernel: __x64_sys_mount+0x170/0x1a0 Jun 02 11:40:04 client1 kernel: do_syscall_64+0x40/0x80 Jun 02 11:40:04 client1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Jun 02 11:40:04 client1 kernel: Jun 02 11:40:04 client1 kernel: Freed by task 115023: Jun 02 11:40:04 client1 kernel: kasan_save_stack+0x1b/0x40 Jun 02 11:40:04 client1 kernel: kasan_set_track+0x1c/0x30 Jun 02 11:40:04 client1 kernel: kasan_set_free_info+0x20/0x30 Jun 02 11:40:04 client1 kernel: __kasan_slab_free+0xec/0x120 Jun 02 11:40:04 client1 kernel: slab_free_freelist_hook+0xb8/0x200 Jun 02 11:40:04 client1 kernel: kfree+0xe3/0x610 Jun 02 11:40:04 client1 kernel: generic_shutdown_super+0x19b/0x1c0 Jun 02 11:40:04 client1 kernel: kill_anon_super+0x24/0x40 Jun 02 11:40:04 client1 kernel: ceph_kill_sb+0x62/0xe0 [ceph] Jun 02 11:40:04 client1 kernel: deactivate_locked_super+0x63/0xe0 Jun 02 11:40:04 client1 kernel: cleanup_mnt+0x1e5/0x240 Jun 02 11:40:04 client1 kernel: task_work_run+0x8b/0xe0 Jun 02 11:40:04 client1 kernel: exit_to_user_mode_prepare+0x23c/0x250 Jun 02 11:40:04 client1 kernel: syscall_exit_to_user_mode+0x27/0x70 Jun 02 11:40:04 client1 kernel: do_syscall_64+0x4d/0x80 Jun 02 11:40:04 client1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Jun 02 11:40:04 client1 kernel: Jun 02 11:40:04 client1 kernel: Last potentially related work creation: Jun 02 11:40:04 client1 kernel: kasan_save_stack+0x1b/0x40 Jun 02 11:40:04 client1 kernel: kasan_record_aux_stack+0xbc/0xe0 Jun 02 11:40:04 client1 kernel: insert_work+0x32/0x170 Jun 02 11:40:04 client1 kernel: __queue_work+0x341/0x720 Jun 02 11:40:04 client1 kernel: mod_delayed_work_on+0xa4/0x110 Jun 02 11:40:04 client1 kernel: wb_shutdown+0xdf/0x120 Jun 02 11:40:04 client1 kernel: bdi_unregister+0xe5/0x2d0 Jun 02 11:40:04 client1 kernel: release_bdi+0x64/0x70 Jun 02 11:40:04 client1 kernel: generic_shutdown_super+0x19b/0x1c0 Jun 02 11:40:04 client1 kernel: kill_anon_super+0x24/0x40 Jun 02 11:40:04 client1 kernel: ceph_kill_sb+0x62/0xe0 [ceph] Jun 02 11:40:04 client1 kernel: deactivate_locked_super+0x63/0xe0 Jun 02 11:40:04 client1 kernel: cleanup_mnt+0x1e5/0x240 Jun 02 11:40:04 client1 kernel: task_work_run+0x8b/0xe0 Jun 02 11:40:04 client1 kernel: exit_to_user_mode_prepare+0x23c/0x250 Jun 02 11:40:04 client1 kernel: syscall_exit_to_user_mode+0x27/0x70 Jun 02 11:40:04 client1 kernel: do_syscall_64+0x4d/0x80 Jun 02 11:40:04 client1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Jun 02 11:40:04 client1 kernel: Jun 02 11:40:04 client1 kernel: Second to last potentially related work creation: Jun 02 11:40:04 client1 kernel: kasan_save_stack+0x1b/0x40 Jun 02 11:40:04 client1 kernel: kasan_record_aux_stack+0xbc/0xe0 Jun 02 11:40:04 client1 kernel: insert_work+0x32/0x170 Jun 02 11:40:04 client1 kernel: __queue_work+0x341/0x720 Jun 02 11:40:04 client1 kernel: mod_delayed_work_on+0xa4/0x110 Jun 02 11:40:04 client1 kernel: wb_queue_work+0x13a/0x220 Jun 02 11:40:04 client1 kernel: bdi_split_work_to_wbs+0x382/0x780 Jun 02 11:40:04 client1 kernel: __writeback_inodes_sb_nr+0x14f/0x190 Jun 02 11:40:04 client1 kernel: sync_filesystem+0x68/0x110 Jun 02 11:40:04 client1 kernel: generic_shutdown_super+0x50/0x1c0 Jun 02 11:40:04 client1 kernel: kill_anon_super+0x24/0x40 Jun 02 11:40:04 client1 kernel: ceph_kill_sb+0x62/0xe0 [ceph] Jun 02 11:40:04 client1 kernel: deactivate_locked_super+0x63/0xe0 Jun 02 11:40:04 client1 kernel: cleanup_mnt+0x1e5/0x240 Jun 02 11:40:04 client1 kernel: task_work_run+0x8b/0xe0 Jun 02 11:40:04 client1 kernel: exit_to_user_mode_prepare+0x23c/0x250 Jun 02 11:40:04 client1 kernel: syscall_exit_to_user_mode+0x27/0x70 Jun 02 11:40:04 client1 kernel: do_syscall_64+0x4d/0x80 Jun 02 11:40:04 client1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Jun 02 11:40:04 client1 kernel: Jun 02 11:40:04 client1 kernel: The buggy address belongs to the object at ffff8881036b8000 which belongs to the cache kmalloc-4k of size 4096 Jun 02 11:40:04 client1 kernel: The buggy address is located 96 bytes inside of 4096-byte region [ffff8881036b8000, ffff8881036b9000) Jun 02 11:40:04 client1 kernel: The buggy address belongs to the page: Jun 02 11:40:04 client1 kernel: page:00000000b5d1c66e refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1036b8 Jun 02 11:40:04 client1 kernel: head:00000000b5d1c66e order:3 compound_mapcount:0 compound_pincount:0 Jun 02 11:40:04 client1 kernel: memcg:ffff8881267a9a41 Jun 02 11:40:04 client1 kernel: flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) Jun 02 11:40:04 client1 kernel: raw: 0017ffffc0010200 dead000000000100 dead000000000122 ffff888100043040 Jun 02 11:40:04 client1 kernel: raw: 0000000000000000 0000000000040004 00000001ffffffff ffff8881267a9a41 Jun 02 11:40:04 client1 kernel: page dumped because: kasan: bad access detected Jun 02 11:40:04 client1 kernel: Jun 02 11:40:04 client1 kernel: Memory state around the buggy address: Jun 02 11:40:04 client1 kernel: ffff8881036b7f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Jun 02 11:40:04 client1 kernel: ffff8881036b7f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Jun 02 11:40:04 client1 kernel: >ffff8881036b8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jun 02 11:40:04 client1 kernel: ^ Jun 02 11:40:04 client1 kernel: ffff8881036b8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jun 02 11:40:04 client1 kernel: ffff8881036b8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jun 02 11:40:04 client1 kernel: ==================================================================
The issue seems to be that we're queueing an asynchronous iput to the workqueues after the point where those workqueues get flushed during umount.
History
#1 Updated by Jeff Layton over 2 years ago
- Description updated (diff)
#2 Updated by Jeff Layton over 2 years ago
There are some potential workarounds, but the issue is itself caused by the workaround that added ceph_async_iput:
commit 3e1d0452edceebb903d23db53201013c940bf000 Author: Yan, Zheng <zyan@redhat.com> Date: Sat May 18 20:39:55 2019 +0800 ceph: avoid iput_final() while holding mutex or in dispatch thread iput_final() may wait for reahahead pages. The wait can cause deadlock. For example: Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: schedule+0x36/0x80 io_schedule+0x16/0x40 __lock_page+0x101/0x140 truncate_inode_pages_range+0x556/0x9f0 truncate_inode_pages_final+0x4d/0x60 evict+0x182/0x1a0 iput+0x1d2/0x220 iterate_session_caps+0x82/0x230 [ceph] dispatch+0x678/0xa80 [ceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40 Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: __schedule+0x3d6/0x8b0 schedule+0x36/0x80 schedule_preempt_disabled+0xe/0x10 mutex_lock+0x2f/0x40 ceph_check_caps+0x505/0xa80 [ceph] ceph_put_wrbuffer_cap_refs+0x1e5/0x2c0 [ceph] writepages_finish+0x2d3/0x410 [ceph] __complete_request+0x26/0x60 [libceph] handle_reply+0x6c8/0xa10 [libceph] dispatch+0x29a/0xbb0 [libceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40 In above example, truncate_inode_pages_range() waits for readahead pages while holding s_mutex. ceph_check_caps() waits for s_mutex and blocks OSD dispatch thread. Later OSD replies (for readahead) can't be handled. ceph_check_caps() also may lock snap_rwsem for read. So similar deadlock can happen if iput_final() is called while holding snap_rwsem. In general, it's not good to call iput_final() inside MDS/OSD dispatch threads or while holding any mutex. The fix is introducing ceph_async_iput(), which calls iput_final() in workqueue. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
I think the right fix here is to get rid of ceph_async_iput and instead work to reduce the scope of the session->s_mutex and snap_rwsem. Both of these locks are held over large swaths of the kclient code, and its very unclear what data they are intended to protect.
#3 Updated by Jeff Layton over 2 years ago
I have a patch series to fix this in the testing branch now. It removes some locking that had an unclear purpose, so this will take some soak time before it's ready for merge.
#4 Updated by Jeff Layton over 2 years ago
- Status changed from New to Resolved
The patches for this are now merged into mainline kernels.