Bug #51158: KASAN use after free when destroying inode at umount - Linux kernel client - Ceph

Actions

Copy link

Bug #51158

closed

KASAN use after free when destroying inode at umount

Added by Jeff Layton almost 3 years ago. Updated almost 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Jeff Layton

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

KASAN occasionally pops this warning when when unmounting:

Jun 02 11:40:04 client1 kernel: ==================================================================
Jun 02 11:40:04 client1 kernel: BUG: KASAN: use-after-free in __destroy_inode+0x59/0x370
Jun 02 11:40:04 client1 kernel: Read of size 8 at addr ffff8881036b8060 by task kworker/u32:2/82065
Jun 02 11:40:04 client1 kernel: 
Jun 02 11:40:04 client1 kernel: CPU: 2 PID: 82065 Comm: kworker/u32:2 Tainted: G        W  OE   T 5.13.0-rc2+ #131
Jun 02 11:40:04 client1 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2.fc34 04/01/2014
Jun 02 11:40:04 client1 kernel: Workqueue: ceph-inode ceph_inode_work [ceph]
Jun 02 11:40:04 client1 kernel: Call Trace:
Jun 02 11:40:04 client1 kernel:  dump_stack+0xa5/0xdc
Jun 02 11:40:04 client1 kernel:  print_address_description.constprop.0+0x18/0x160
Jun 02 11:40:04 client1 kernel:  ? __destroy_inode+0x59/0x370
Jun 02 11:40:04 client1 kernel:  kasan_report.cold+0x7f/0x111
Jun 02 11:40:04 client1 kernel:  ? __destroy_inode+0x59/0x370
Jun 02 11:40:04 client1 kernel:  __destroy_inode+0x59/0x370
Jun 02 11:40:04 client1 kernel:  destroy_inode+0x55/0xd0
Jun 02 11:40:04 client1 kernel:  process_one_work+0x524/0x9b0
Jun 02 11:40:04 client1 kernel:  ? pwq_dec_nr_in_flight+0x110/0x110
Jun 02 11:40:04 client1 kernel:  ? lock_acquired+0x301/0x560
Jun 02 11:40:04 client1 kernel:  worker_thread+0x2f0/0x6f0
Jun 02 11:40:04 client1 kernel:  ? process_one_work+0x9b0/0x9b0
Jun 02 11:40:04 client1 kernel:  kthread+0x1fb/0x220
Jun 02 11:40:04 client1 kernel:  ? __kthread_bind_mask+0x70/0x70
Jun 02 11:40:04 client1 kernel:  ret_from_fork+0x22/0x30
Jun 02 11:40:04 client1 kernel: 
Jun 02 11:40:04 client1 kernel: Allocated by task 115001:
Jun 02 11:40:04 client1 kernel:  kasan_save_stack+0x1b/0x40
Jun 02 11:40:04 client1 kernel:  __kasan_kmalloc+0x7c/0x90
Jun 02 11:40:04 client1 kernel:  bdi_alloc+0x2f/0x90
Jun 02 11:40:04 client1 kernel:  super_setup_bdi_name+0x85/0x140
Jun 02 11:40:04 client1 kernel:  ceph_get_tree+0x526/0xc00 [ceph]
Jun 02 11:40:04 client1 kernel:  vfs_get_tree+0x4c/0x140
Jun 02 11:40:04 client1 kernel:  path_mount+0x58e/0xf80
Jun 02 11:40:04 client1 kernel:  __x64_sys_mount+0x170/0x1a0
Jun 02 11:40:04 client1 kernel:  do_syscall_64+0x40/0x80
Jun 02 11:40:04 client1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jun 02 11:40:04 client1 kernel: 
Jun 02 11:40:04 client1 kernel: Freed by task 115023:
Jun 02 11:40:04 client1 kernel:  kasan_save_stack+0x1b/0x40
Jun 02 11:40:04 client1 kernel:  kasan_set_track+0x1c/0x30
Jun 02 11:40:04 client1 kernel:  kasan_set_free_info+0x20/0x30
Jun 02 11:40:04 client1 kernel:  __kasan_slab_free+0xec/0x120
Jun 02 11:40:04 client1 kernel:  slab_free_freelist_hook+0xb8/0x200
Jun 02 11:40:04 client1 kernel:  kfree+0xe3/0x610
Jun 02 11:40:04 client1 kernel:  generic_shutdown_super+0x19b/0x1c0
Jun 02 11:40:04 client1 kernel:  kill_anon_super+0x24/0x40
Jun 02 11:40:04 client1 kernel:  ceph_kill_sb+0x62/0xe0 [ceph]
Jun 02 11:40:04 client1 kernel:  deactivate_locked_super+0x63/0xe0
Jun 02 11:40:04 client1 kernel:  cleanup_mnt+0x1e5/0x240
Jun 02 11:40:04 client1 kernel:  task_work_run+0x8b/0xe0
Jun 02 11:40:04 client1 kernel:  exit_to_user_mode_prepare+0x23c/0x250
Jun 02 11:40:04 client1 kernel:  syscall_exit_to_user_mode+0x27/0x70
Jun 02 11:40:04 client1 kernel:  do_syscall_64+0x4d/0x80
Jun 02 11:40:04 client1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jun 02 11:40:04 client1 kernel: 
Jun 02 11:40:04 client1 kernel: Last potentially related work creation:
Jun 02 11:40:04 client1 kernel:  kasan_save_stack+0x1b/0x40
Jun 02 11:40:04 client1 kernel:  kasan_record_aux_stack+0xbc/0xe0
Jun 02 11:40:04 client1 kernel:  insert_work+0x32/0x170
Jun 02 11:40:04 client1 kernel:  __queue_work+0x341/0x720
Jun 02 11:40:04 client1 kernel:  mod_delayed_work_on+0xa4/0x110
Jun 02 11:40:04 client1 kernel:  wb_shutdown+0xdf/0x120
Jun 02 11:40:04 client1 kernel:  bdi_unregister+0xe5/0x2d0
Jun 02 11:40:04 client1 kernel:  release_bdi+0x64/0x70
Jun 02 11:40:04 client1 kernel:  generic_shutdown_super+0x19b/0x1c0
Jun 02 11:40:04 client1 kernel:  kill_anon_super+0x24/0x40
Jun 02 11:40:04 client1 kernel:  ceph_kill_sb+0x62/0xe0 [ceph]
Jun 02 11:40:04 client1 kernel:  deactivate_locked_super+0x63/0xe0
Jun 02 11:40:04 client1 kernel:  cleanup_mnt+0x1e5/0x240
Jun 02 11:40:04 client1 kernel:  task_work_run+0x8b/0xe0
Jun 02 11:40:04 client1 kernel:  exit_to_user_mode_prepare+0x23c/0x250
Jun 02 11:40:04 client1 kernel:  syscall_exit_to_user_mode+0x27/0x70
Jun 02 11:40:04 client1 kernel:  do_syscall_64+0x4d/0x80
Jun 02 11:40:04 client1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jun 02 11:40:04 client1 kernel: 
Jun 02 11:40:04 client1 kernel: Second to last potentially related work creation:
Jun 02 11:40:04 client1 kernel:  kasan_save_stack+0x1b/0x40
Jun 02 11:40:04 client1 kernel:  kasan_record_aux_stack+0xbc/0xe0
Jun 02 11:40:04 client1 kernel:  insert_work+0x32/0x170
Jun 02 11:40:04 client1 kernel:  __queue_work+0x341/0x720
Jun 02 11:40:04 client1 kernel:  mod_delayed_work_on+0xa4/0x110
Jun 02 11:40:04 client1 kernel:  wb_queue_work+0x13a/0x220
Jun 02 11:40:04 client1 kernel:  bdi_split_work_to_wbs+0x382/0x780
Jun 02 11:40:04 client1 kernel:  __writeback_inodes_sb_nr+0x14f/0x190
Jun 02 11:40:04 client1 kernel:  sync_filesystem+0x68/0x110
Jun 02 11:40:04 client1 kernel:  generic_shutdown_super+0x50/0x1c0
Jun 02 11:40:04 client1 kernel:  kill_anon_super+0x24/0x40
Jun 02 11:40:04 client1 kernel:  ceph_kill_sb+0x62/0xe0 [ceph]
Jun 02 11:40:04 client1 kernel:  deactivate_locked_super+0x63/0xe0
Jun 02 11:40:04 client1 kernel:  cleanup_mnt+0x1e5/0x240
Jun 02 11:40:04 client1 kernel:  task_work_run+0x8b/0xe0
Jun 02 11:40:04 client1 kernel:  exit_to_user_mode_prepare+0x23c/0x250
Jun 02 11:40:04 client1 kernel:  syscall_exit_to_user_mode+0x27/0x70
Jun 02 11:40:04 client1 kernel:  do_syscall_64+0x4d/0x80
Jun 02 11:40:04 client1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jun 02 11:40:04 client1 kernel: 
Jun 02 11:40:04 client1 kernel: The buggy address belongs to the object at ffff8881036b8000
                                 which belongs to the cache kmalloc-4k of size 4096
Jun 02 11:40:04 client1 kernel: The buggy address is located 96 bytes inside of
                                 4096-byte region [ffff8881036b8000, ffff8881036b9000)
Jun 02 11:40:04 client1 kernel: The buggy address belongs to the page:
Jun 02 11:40:04 client1 kernel: page:00000000b5d1c66e refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1036b8
Jun 02 11:40:04 client1 kernel: head:00000000b5d1c66e order:3 compound_mapcount:0 compound_pincount:0
Jun 02 11:40:04 client1 kernel: memcg:ffff8881267a9a41
Jun 02 11:40:04 client1 kernel: flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
Jun 02 11:40:04 client1 kernel: raw: 0017ffffc0010200 dead000000000100 dead000000000122 ffff888100043040
Jun 02 11:40:04 client1 kernel: raw: 0000000000000000 0000000000040004 00000001ffffffff ffff8881267a9a41
Jun 02 11:40:04 client1 kernel: page dumped because: kasan: bad access detected
Jun 02 11:40:04 client1 kernel: 
Jun 02 11:40:04 client1 kernel: Memory state around the buggy address:
Jun 02 11:40:04 client1 kernel:  ffff8881036b7f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Jun 02 11:40:04 client1 kernel:  ffff8881036b7f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Jun 02 11:40:04 client1 kernel: >ffff8881036b8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 02 11:40:04 client1 kernel:                                                        ^
Jun 02 11:40:04 client1 kernel:  ffff8881036b8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 02 11:40:04 client1 kernel:  ffff8881036b8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 02 11:40:04 client1 kernel: ==================================================================

The issue seems to be that we're queueing an asynchronous iput to the workqueues after the point where those workqueues get flushed during umount.

Actions

Copy link

Updated by Jeff Layton almost 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Jeff Layton almost 3 years ago

There are some potential workarounds, but the issue is itself caused by the workaround that added ceph_async_iput:

commit 3e1d0452edceebb903d23db53201013c940bf000
Author: Yan, Zheng <zyan@redhat.com>
Date:   Sat May 18 20:39:55 2019 +0800

    ceph: avoid iput_final() while holding mutex or in dispatch thread

    iput_final() may wait for reahahead pages. The wait can cause deadlock.
    For example:

      Workqueue: ceph-msgr ceph_con_workfn [libceph]
        Call Trace:
         schedule+0x36/0x80
         io_schedule+0x16/0x40
         __lock_page+0x101/0x140
         truncate_inode_pages_range+0x556/0x9f0
         truncate_inode_pages_final+0x4d/0x60
         evict+0x182/0x1a0
         iput+0x1d2/0x220
         iterate_session_caps+0x82/0x230 [ceph]
         dispatch+0x678/0xa80 [ceph]
         ceph_con_workfn+0x95b/0x1560 [libceph]
         process_one_work+0x14d/0x410
         worker_thread+0x4b/0x460
         kthread+0x105/0x140
         ret_from_fork+0x22/0x40

      Workqueue: ceph-msgr ceph_con_workfn [libceph]
        Call Trace:
         __schedule+0x3d6/0x8b0
         schedule+0x36/0x80
         schedule_preempt_disabled+0xe/0x10
         mutex_lock+0x2f/0x40
         ceph_check_caps+0x505/0xa80 [ceph]
         ceph_put_wrbuffer_cap_refs+0x1e5/0x2c0 [ceph]
         writepages_finish+0x2d3/0x410 [ceph]
         __complete_request+0x26/0x60 [libceph]
         handle_reply+0x6c8/0xa10 [libceph]
         dispatch+0x29a/0xbb0 [libceph]
         ceph_con_workfn+0x95b/0x1560 [libceph]
         process_one_work+0x14d/0x410
         worker_thread+0x4b/0x460
         kthread+0x105/0x140
         ret_from_fork+0x22/0x40

    In above example, truncate_inode_pages_range() waits for readahead pages
    while holding s_mutex. ceph_check_caps() waits for s_mutex and blocks
    OSD dispatch thread. Later OSD replies (for readahead) can't be handled.

    ceph_check_caps() also may lock snap_rwsem for read. So similar deadlock
    can happen if iput_final() is called while holding snap_rwsem.

    In general, it's not good to call iput_final() inside MDS/OSD dispatch
    threads or while holding any mutex.

    The fix is introducing ceph_async_iput(), which calls iput_final() in
    workqueue.

    Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
    Reviewed-by: Jeff Layton <jlayton@redhat.com>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

I think the right fix here is to get rid of ceph_async_iput and instead work to reduce the scope of the session->s_mutex and snap_rwsem. Both of these locks are held over large swaths of the kclient code, and its very unclear what data they are intended to protect.

Actions

Copy link

Updated by Jeff Layton almost 3 years ago

I have a patch series to fix this in the testing branch now. It removes some locking that had an unclear purpose, so this will take some soak time before it's ready for merge.

Actions

Copy link

Updated by Jeff Layton almost 3 years ago

Status changed from New to Resolved

The patches for this are now merged into mainline kernels.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #51158

KASAN use after free when destroying inode at umount

Updated by Jeff Layton almost 3 years ago

Updated by Jeff Layton almost 3 years ago

Updated by Jeff Layton almost 3 years ago

Updated by Jeff Layton almost 3 years ago