Project

General

Profile

Actions

Bug #48569

closed

KASAN use after free warning when evicting inode

Added by Jeff Layton over 3 years ago. Updated almost 2 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I got this KASAN warning when I was testing something with dynamic_debug turned up:

[13389.775362] ==================================================================
[13389.775364] BUG: KASAN: use-after-free in string+0xe1/0x1c0
[13389.775367] Read of size 1 at addr ffff888117cc8408 by task kworker/u32:0/149349
[13389.775368] 
[13389.775370] CPU: 9 PID: 149349 Comm: kworker/u32:0 Tainted: G            E     5.10.0-rc7+ #223
[13389.775376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
[13389.775380] Workqueue: ceph-inode ceph_inode_work [ceph]
[13389.777380] Call Trace:
[13389.777382]  dump_stack+0xae/0xe5
[13389.777383]  ? string+0xe1/0x1c0
[13389.777385]  print_address_description.constprop.0+0x1c/0x250
[13389.777387]  ? string+0xe1/0x1c0
[13389.777389]  ? string+0xe1/0x1c0
[13389.777390]  kasan_report.cold+0x37/0x7c
[13389.777392]  ? put_dec_full8+0x90/0xe0
[13389.777394]  ? string+0xe1/0x1c0
[13389.777395]  string+0xe1/0x1c0
[13389.778396]  ? ip6_addr_string_sa+0x400/0x400
[13389.778397]  vsnprintf+0x3f8/0x820
[13389.778399]  ? pointer+0x650/0x650
[13389.778400]  ? mark_lock+0xea/0xd30
[13389.778402]  va_format.constprop.0+0xe1/0x1d0
[13389.778404]  ? lock_chain_count+0x20/0x20
[13389.778405]  ? sprintf+0xb0/0xb0
[13389.778407]  ? widen_string+0x65/0x130
[13389.778409]  pointer+0xd1/0x650
[13389.778410]  ? format_decode+0x1cf/0x4e0
[13389.778412]  ? resource_string.isra.0+0xc10/0xc10
[13389.778413]  vsnprintf+0x2e0/0x820
[13389.780406]  ? pointer+0x650/0x650
[13389.780408]  ? pvclock_clocksource_read+0xd9/0x1a0
[13389.780410]  vscnprintf+0x9/0x30
[13389.780411]  vprintk_store+0x89/0x2c0
[13389.780413]  ? kvm_sched_clock_read+0x14/0x30
[13389.780415]  ? sched_clock+0x5/0x10
[13389.780416]  ? do_syslog+0x50/0x50
[13389.780418]  ? lock_contended+0x620/0x620
[13389.780420]  vprintk_emit+0x239/0x400
[13389.780421]  printk+0x96/0xb2
[13389.780423]  ? record_print_text.cold+0x11/0x11
[13389.781424]  ? __dynamic_pr_debug+0xeb/0x130
[13389.781426]  ? dynamic_emit_prefix+0x2a/0x200
[13389.781427]  __dynamic_pr_debug+0xeb/0x130
[13389.781429]  ? dynamic_emit_prefix+0x200/0x200
[13389.781431]  ? kvm_sched_clock_read+0x14/0x30
[13389.781433]  ? sched_clock+0x5/0x10
[13389.781434]  ? sched_clock_local+0x10/0xc0
[13389.781436]  ? __ceph_remove_cap+0x532/0x5e0 [ceph]
[13389.781438]  __ceph_destroy_xattrs+0xf7/0x1c0 [ceph]
[13389.781440]  ceph_evict_inode+0x267/0x4f0 [ceph]
[13389.783435]  evict+0x18c/0x320
[13389.783437]  process_one_work+0x501/0x990
[13389.783439]  ? pwq_dec_nr_in_flight+0x110/0x110
[13389.783441]  worker_thread+0x2ea/0x6e0
[13389.783442]  ? __kthread_parkme+0xc0/0xe0
[13389.783444]  ? process_one_work+0x990/0x990
[13389.783445]  kthread+0x1fb/0x220
[13389.783447]  ? kthread_create_worker_on_cpu+0xb0/0xb0
[13389.783449]  ret_from_fork+0x22/0x30
[13389.783450] 
[13389.783452] Allocated by task 107824:
[13389.784452]  kasan_save_stack+0x1b/0x40
[13389.784454]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[13389.784456]  ceph_kvmalloc+0xc7/0x110 [libceph]
[13389.784458]  ceph_buffer_new+0x62/0x110 [libceph]
[13389.784460]  ceph_fill_inode+0x1072/0x1bb0 [ceph]
[13389.784462]  ceph_fill_trace+0x49b/0x1360 [ceph]
[13389.784463]  dispatch+0xe85/0x23f0 [ceph]
[13389.784465]  ceph_con_workfn+0x21c6/0x3570 [libceph]
[13389.784467]  process_one_work+0x501/0x990
[13389.784468]  worker_thread+0x2ea/0x6e0
[13389.784470]  kthread+0x1fb/0x220
[13389.784472]  ret_from_fork+0x22/0x30
[13389.784473] 
[13389.786458] Freed by task 107824:
[13389.786460]  kasan_save_stack+0x1b/0x40
[13389.786462]  kasan_set_track+0x1c/0x30
[13389.786464]  kasan_set_free_info+0x1b/0x30
[13389.786465]  __kasan_slab_free+0x110/0x150
[13389.786467]  slab_free_freelist_hook+0x82/0x1b0
[13389.786469]  kfree+0xdf/0x610
[13389.786470]  ceph_buffer_release+0x20/0x60 [libceph]
[13389.786472]  __send_cap+0xd0/0x180 [ceph]
[13389.786474]  ceph_check_caps+0x863/0xf30 [ceph]
[13389.786476]  ceph_check_delayed_caps+0x143/0x1d0 [ceph]
[13389.787472]  delayed_work+0x22f/0x300 [ceph]
[13389.787474]  process_one_work+0x501/0x990
[13389.787476]  worker_thread+0x2ea/0x6e0
[13389.787478]  kthread+0x1fb/0x220
[13389.787479]  ret_from_fork+0x22/0x30
[13389.787481] 
[13389.787483] The buggy address belongs to the object at ffff888117cc8400
[13389.787485]  which belongs to the cache kmalloc-128 of size 128
[13389.787487] The buggy address is located 8 bytes inside of
[13389.789482]  128-byte region [ffff888117cc8400, ffff888117cc8480)
[13389.789484] The buggy address belongs to the page:
[13389.789486] page:000000000e5b4508 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x117cc8
[13389.789488] head:000000000e5b4508 order:1 compound_mapcount:0
[13389.789490] flags: 0x17ffffc0010200(slab|head)
[13389.789493] raw: 0017ffffc0010200 dead000000000100 dead000000000122 ffff888100043680
[13389.789495] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[13389.789497] page dumped because: kasan: bad access detected
[13389.790489] 
[13389.790490] Memory state around the buggy address:
[13389.790493]  ffff888117cc8300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13389.790495]  ffff888117cc8380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[13389.790497] >ffff888117cc8400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13389.790499]                       ^
[13389.790501]  ffff888117cc8480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[13389.790504]  ffff888117cc8500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13389.790506] ==================================================================
[13389.792498] Disabling lock debugging due to kernel taint
[13481.660751] ------------[ cut here ]------------
[13481.660753] precision 65536 too large
[13481.660755] WARNING: CPU: 1 PID: 152995 at lib/vsprintf.c:2515 set_precision+0x6d/0x80
[13481.660756] Modules linked in: loop(E) ceph(E) libceph(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) rfkill(E) nf_tables(E) nfnetlink(E) cachefiles(E) fscache(E) netfs(E) sunrpc(E) i2c_i801(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) i2c_smbus(E) joydev(E) pcspkr(E) virtio_balloon(E) lpc_ich(E) zram(E) ip_tables(E) xfs(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) virtio_console(E) ghash_clmulni_intel(E) serio_raw(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E) qxl(E) drm_ttm_helper(E) ttm(E) drm_kms_helper(E) cec(E) drm(E) qemu_fw_cfg(E)
[13481.660810] CPU: 1 PID: 152995 Comm: attr Tainted: G    B       E     5.10.0-rc7+ #223
[13481.660812] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
[13481.660813] RIP: 0010:set_precision+0x6d/0x80
[13481.660817] Code: 0f 48 de 81 fb ff 7f 00 00 0f 4f d8 88 5d 06 88 7d 07 5b 5d c3 89 de 48 c7 c7 c0 49 d1 b4 c6 05 52 4a 73 02 01 e8 c7 8f 8b 00 <0f> 0b eb b9 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 b8 00
[13481.660819] RSP: 0018:ffff88810ff77428 EFLAGS: 00010082
[13481.660821] RAX: 0000000000000000 RBX: 0000000000010000 RCX: 0000000000000000
[13481.660823] RDX: 0000000000000023 RSI: 0000000000000004 RDI: ffffed1021feee7b
[13481.660825] RBP: ffff88810ff77490 R08: ffffffffb3361d0e R09: ffff88841cde8293
[13481.660826] R10: ffffed10839bd052 R11: 0000000000000001 R12: ffffffffc128952c
[13481.660828] R13: ffff88810ff77538 R14: ffffffffb797c640 R15: ffffffffc128952c
[13481.660830] FS:  00007f693e8ab740(0000) GS:ffff88841cc00000(0000) knlGS:0000000000000000
[13481.660832] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13481.660833] CR2: 0000565511ac4000 CR3: 00000001296ba000 CR4: 00000000003506e0
[13481.660834] Call Trace:
[13481.660836]  vsnprintf+0x397/0x820
[13481.660837]  ? pointer+0x650/0x650
[13481.660838]  va_format.constprop.0+0xe1/0x1d0
[13481.660840]  ? sprintf+0xb0/0xb0
[13481.660841]  ? desc_read_finalized_seq+0xd0/0xd0
[13481.660842]  ? widen_string+0x65/0x130
[13481.660844]  ? lock_release+0x9c/0x6b0
[13481.660845]  pointer+0xd1/0x650
[13481.660846]  ? format_decode+0x1cf/0x4e0
[13481.660848]  ? resource_string.isra.0+0xc10/0xc10
[13481.660849]  ? lock_release+0x6b0/0x6b0
[13481.660851]  vsnprintf+0x2e0/0x820
[13481.660852]  ? pointer+0x650/0x650
[13481.660853]  vscnprintf+0x9/0x30
[13481.660855]  vprintk_store+0x89/0x2c0
[13481.660856]  ? do_syslog+0x50/0x50
[13481.660857]  ? lock_acquired+0xa3/0x550
[13481.661844]  ? lock_contended+0x620/0x620
[13481.661845]  vprintk_emit+0x239/0x400
[13481.661847]  printk+0x96/0xb2
[13481.661848]  ? record_print_text.cold+0x11/0x11
[13481.661850]  ? dynamic_emit_prefix+0x2a/0x200
[13481.661851]  ? __zone_watermark_ok+0x270/0x270
[13481.661852]  ? __dynamic_pr_debug+0xeb/0x130
[13481.661854]  ? dynamic_emit_prefix+0x2a/0x200
[13481.661855]  __dynamic_pr_debug+0xeb/0x130
[13481.661856]  ? dynamic_emit_prefix+0x200/0x200
[13481.661858]  ? __alloc_pages_slowpath.constprop.0+0x1670/0x1670
[13481.661859]  ? vma_policy_mof+0x120/0x120
[13481.661861]  ? policy_node+0x56/0x60
[13481.661862]  ? __might_sleep+0x31/0xd0
[13481.661863]  ? memcpy+0x39/0x60
[13481.662849]  ? ceph_pagelist_append+0x2c1/0x2e0 [libceph]
[13481.662850]  __ceph_setxattr+0x8d5/0xbe0 [ceph]
[13481.662852]  ? ceph_listxattr+0x300/0x300 [ceph]
[13481.662853]  ? selinux_inode_permission+0x1fc/0x2f0
[13481.662855]  ? ceph_is_valid_xattr+0x71/0x80 [ceph]
[13481.662856]  __vfs_setxattr+0xb8/0xf0
[13481.662857]  ? xattr_resolve_name+0x170/0x170
[13481.662859]  ? security_inode_setxattr+0x77/0xd0
[13481.662860]  __vfs_setxattr_noperm+0x88/0x270
[13481.662861]  vfs_setxattr+0xae/0x190
[13481.662863]  ? __vfs_setxattr_locked+0x150/0x150
[13481.662864]  setxattr+0x17a/0x2c0
[13481.662866]  ? vfs_setxattr+0x190/0x190
[13481.662867]  ? may_linkat+0xe0/0xe0
[13481.662868]  ? lock_acquire+0xb3/0x4d0
[13481.662870]  ? lock_release+0x6b0/0x6b0
[13481.662871]  ? strncpy_from_user+0x226/0x270
[13481.662872]  ? __might_sleep+0x31/0xd0
[13481.662874]  ? __mnt_want_write+0xc2/0xd0
[13481.662875]  path_setxattr+0xf7/0x110
[13481.662876]  ? setxattr+0x2c0/0x2c0
[13481.662878]  __x64_sys_lsetxattr+0x65/0x70
[13481.662879]  do_syscall_64+0x33/0x40
[13481.662881]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[13481.662882] RIP: 0033:0x7f693e9ad6be
[13481.662885] Code: 48 8b 0d bd 47 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 bd 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8a 47 0c 00 f7 d8 64 89 01 48
[13481.662887] RSP: 002b:00007ffce4b2bb48 EFLAGS: 00000202 ORIG_RAX: 00000000000000bd
[13481.662890] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f693e9ad6be
[13481.662892] RDX: 0000565511ab52f0 RSI: 00007ffce4b2bb70 RDI: 000

It looks like ceph_check_caps called from ceph_check_delayed_caps raced with inode eviction, which shouldn't happen. That function goes an igrab on the inode before calling ceph_check_caps. Either the igrab should return NULL, or that should prevent the eviction from occurring.

The inode work queuing functions take a reference with i_hold though. I wonder if we may be racing with I_WILL_FREE being set, and are taking an extra reference after the point where the eviction has begun.

Actions #1

Updated by Jeff Layton almost 2 years ago

  • Status changed from New to Can't reproduce

Haven't seen this since then, and we've made some changes to the inode work functions in the meantime. Closing this out.

Actions

Also available in: Atom PDF