Bug #53180
Attempt to access reserved inode number 0x101
0%
Description
While investigating https://tracker.ceph.com/issues/49922, A new warning is added to the kernel cephfs client. Now we are triggering this warning multiple times. the following is an example:
Nov 03 14:49:19 gpu015 kernel: ------------[ cut here ]------------ Nov 03 14:49:19 gpu015 kernel: Attempt to access reserved inode number 0x101 Nov 03 14:49:19 gpu015 kernel: WARNING: CPU: 15 PID: 1256107 at fs/ceph/super.h:548 __lookup_inode+0x162/0x1a0 [ceph] Nov 03 14:49:19 gpu015 kernel: Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid ib_core erofs rbd ipt_rpfilter iptable_raw ip_set_hash_ip ip_set_hash_net ipip tunnel4 ip_tunnel xt_multiport xt_set ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipportnet ip_set_hash_ipport ip_set dummy ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs binfmt_misc ip6table_nat ip6_tables iptable_mangle xt_comment xt_mark ceph libceph fscache xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc aufs overlay dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi snd_hda_intel kvm_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core kvm snd_hwdep soundwire_bus snd_soc_core snd_compress Nov 03 14:49:19 gpu015 kernel: ac97_bus snd_pcm_dmaengine snd_pcm rapl snd_timer snd intel_cstate soundcore mei_me mei mxm_wmi acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad mac_hid nvidia_uvm(POE) sch_fq_codel msr sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) ast drm_vram_helper i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfillrect ixgbe sysimgblt aesni_intel fb_sys_fops cec ahci xfrm_algo rc_core crypto_simd i2c_i801 dca cryptd libahci i2c_smbus glue_helper drm i40e mdio lpc_ich xhci_pci xhci_pci_renesas wmi Nov 03 14:49:19 gpu015 kernel: CPU: 15 PID: 1256107 Comm: node Tainted: P W OE 5.11.0-34-generic #36~20.04.1-Ubuntu Nov 03 14:49:19 gpu015 kernel: Hardware name: TYAN B7079F77CV10HR-2T-N/S7079GM2NR-2T-N, BIOS V2.05.B10 02/27/2018 Nov 03 14:49:19 gpu015 kernel: RIP: 0010:__lookup_inode+0x162/0x1a0 [ceph] Nov 03 14:49:19 gpu015 kernel: Code: 7e 2f 48 85 c0 0f 85 21 ff ff ff 48 63 c3 85 db 0f 89 51 ff ff ff e9 11 ff ff ff 4c 89 e6 48 c7 c7 e0 1d e7 c0 e8 fb 78 34 e6 <0f> 0b e9 36 ff ff ff be 03 00 00 00 48 89 45 c0 e8 b9 4e d4 e5 48 Nov 03 14:49:19 gpu015 kernel: RSP: 0018:ffffa95d70aa7c30 EFLAGS: 00010286 Nov 03 14:49:19 gpu015 kernel: RAX: 0000000000000000 RBX: ffff98708a884540 RCX: 0000000000000027 Nov 03 14:49:19 gpu015 kernel: RDX: 0000000000000027 RSI: 000000010001ae5a RDI: ffff98a03f958ac8 Nov 03 14:49:19 gpu015 kernel: RBP: ffffa95d70aa7c70 R08: ffff98a03f958ac0 R09: ffffa95d70aa79f0 Nov 03 14:49:19 gpu015 kernel: R10: 000000000193a510 R11: 000000000193a570 R12: 0000000000000101 Nov 03 14:49:19 gpu015 kernel: R13: ffff98708a884568 R14: ffff98708a884540 R15: ffff9880c6dcd8a8 Nov 03 14:49:19 gpu015 kernel: FS: 00007f9d87540780(0000) GS:ffff98a03f940000(0000) knlGS:0000000000000000 Nov 03 14:49:19 gpu015 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 03 14:49:19 gpu015 kernel: CR2: 00007fa8f0003ba2 CR3: 0000003a5bbc0006 CR4: 00000000003706e0 Nov 03 14:49:19 gpu015 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 03 14:49:19 gpu015 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Nov 03 14:49:19 gpu015 kernel: Call Trace: Nov 03 14:49:19 gpu015 kernel: ceph_lookup_inode+0xe/0x30 [ceph] Nov 03 14:49:19 gpu015 kernel: lookup_quotarealm_inode.isra.0+0x168/0x220 [ceph] Nov 03 14:49:19 gpu015 kernel: check_quota_exceeded+0x1c5/0x230 [ceph] Nov 03 14:49:19 gpu015 kernel: ceph_quota_is_max_bytes_exceeded+0x59/0x60 [ceph] Nov 03 14:49:19 gpu015 kernel: ceph_write_iter+0x1a3/0x780 [ceph] Nov 03 14:49:19 gpu015 kernel: ? aa_file_perm+0x118/0x480 Nov 03 14:49:19 gpu015 kernel: new_sync_write+0x117/0x1b0 Nov 03 14:49:19 gpu015 kernel: vfs_write+0x1ca/0x280 Nov 03 14:49:19 gpu015 kernel: ksys_write+0x67/0xe0 Nov 03 14:49:19 gpu015 kernel: __x64_sys_write+0x1a/0x20 Nov 03 14:49:19 gpu015 kernel: do_syscall_64+0x38/0x90 Nov 03 14:49:19 gpu015 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Nov 03 14:49:19 gpu015 kernel: RIP: 0033:0x7f9d8765621f Nov 03 14:49:19 gpu015 kernel: Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 59 65 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 44 24 08 e8 8c 65 f8 ff 48 Nov 03 14:49:19 gpu015 kernel: RSP: 002b:00007ffec4811220 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 Nov 03 14:49:19 gpu015 kernel: RAX: ffffffffffffffda RBX: 0000000000000057 RCX: 00007f9d8765621f Nov 03 14:49:19 gpu015 kernel: RDX: 0000000000000057 RSI: 00000000064fbd30 RDI: 0000000000000019 Nov 03 14:49:19 gpu015 kernel: RBP: 00000000064fbd30 R08: 0000000000000000 R09: 00007f9d84237f00 Nov 03 14:49:19 gpu015 kernel: R10: 0000000000000064 R11: 0000000000000293 R12: 0000000000000057 Nov 03 14:49:19 gpu015 kernel: R13: 0000000006513b50 R14: 00007f9d877324a0 R15: 00007f9d877318a0 Nov 03 14:49:19 gpu015 kernel: ---[ end trace 216b86ebc3c91378 ]---
This is another slightly different stack trace
ceph_lookup_inode+0xe/0x30 [ceph] lookup_quotarealm_inode.isra.0+0x168/0x220 [ceph] check_quota_exceeded+0x1c5/0x230 [ceph] ceph_quota_is_max_bytes_exceeded+0x59/0x60 [ceph] ceph_write_iter+0x1a3/0x780 [ceph] ? aa_file_perm+0x118/0x480 ? do_wp_page+0x1bd/0x330 new_sync_write+0x117/0x1b0 vfs_write+0x1ca/0x280 ksys_write+0x67/0xe0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9
This may be related to OOM, some of these warnings are just go after an OOM message
History
#1 Updated by Venky Shankar about 2 years ago
- Status changed from New to Triaged
- Assignee set to Jeff Layton
#2 Updated by Jeff Layton about 2 years ago
Interesting. It looks like what happened is that this file got moved into a stray directory while the client application was still writing data to it. It then tried to get quota information from the parent which involved doing a lookup for it, at which point the warning popped.
That may mean that this warning is bogus and that we should just remove it, but I need a better understanding of what it means for a file to be in the strays directory.
#3 Updated by Jeff Layton about 2 years ago
Probably what we should do is assume that stray dirs have no rquota on them. We already have a carveout for the root ino in ceph_has_realms_with_quotas(). We should be able to add ones for MDS dirs as well. I'll see if I can draft up a patch.
#4 Updated by Jeff Layton about 2 years ago
Patch posted to the ceph-devel mailing list:
https://lore.kernel.org/ceph-devel/20211109171011.39571-1-jlayton@kernel.org/T/#u
玮文 胡, if you pass along your email address, I can give you Reported-by credit when we merge a patch for this.
#5 Updated by Jeff Layton about 2 years ago
- Status changed from Triaged to Fix Under Review
#6 Updated by 玮文 胡 about 2 years ago
Thanks. My email address:
Reported-by: Hu Weiwen <sehuww@mail.scut.edu.cn>
#7 Updated by Jeff Layton over 1 year ago
- Status changed from Fix Under Review to Resolved
#8 Updated by 玮文 胡 over 1 year ago
I think the above patch is not yet pushed to the testing branch of https://github.com/ceph/ceph-client. Why is this issue marked resolved?
#9 Updated by Jeff Layton over 1 year ago
This patch was merged into v5.17:
commit 0078ea3b0566e3da09ae8e1e4fbfd708702f2876 Author: Jeff Layton <jlayton@kernel.org> Date: Tue Nov 9 09:54:49 2021 -0500 ceph: don't check for quotas on MDS stray dirs
#10 Updated by Jeff Layton over 1 year ago
- Project changed from CephFS to Linux kernel client
#11 Updated by 玮文 胡 over 1 year ago
OK, thanks. I forgot that [PATCH v2] because it is not listed under the same thread in https://lore.kernel.org/ceph-devel/ . Sorry for disturb.