Project

General

Profile

Bug #56693

Updated by Jeff Layton almost 2 years ago

Opening as a kernel bug, but this may be a MDS problem. I hit this warning today testing with the current testing branch (commit 7515a529973fc) today: 

 <pre> 
 [    511.413581] WARNING: CPU: 3 PID: 96 at fs/ceph/inode.c:745 ceph_fill_file_size+0x284/0x290 [ceph] 
 [    511.419352] Modules linked in: ceph(E) libceph(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) rfkill(E) nf_tables(E) nfnetlink(E) cachefiles(E) fscache(E) netfs(E) sunrpc(E) i2c_i801(E) i2c_smbus(E) joydev(E) virtio_balloon(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) lpc_ich(E) intel_rapl_msr(E) intel_rapl_common(E) fuse(E) zram(E) xfs(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) virtio_gpu(E) virtio_dma_buf(E) drm_shmem_helper(E) drm_kms_helper(E) virtio_net(E) net_failover(E) virtio_blk(E) virtio_console(E) failover(E) drm(E) qemu_fw_cfg(E) 
 [    511.453023] CPU: 3 PID: 96 Comm: kworker/3:1 Tainted: G              E       5.19.0-rc6+ #320 
 [    511.461188] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 
 [    511.469479] Workqueue: ceph-msgr ceph_con_workfn [libceph] 
 [    511.475692] RIP: 0010:ceph_fill_file_size+0x284/0x290 [ceph] 
 [    511.480806] Code: 8b 44 24 04 e9 ed fe ff ff 48 8d bb a8 07 00 00 e8 91 58 8d c9 8b 83 a8 07 00 00 85 c0 0f 85 93 fe ff ff 31 c0 e9 a4 fe ff ff <0f> 0b e9 52 fe ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 
 [    511.491849] RSP: 0018:ffff888101787518 EFLAGS: 00010206 
 [    511.496261] RAX: 0000000000000000 RBX: ffff88815e236680 RCX: dffffc0000000000 
 [    511.500979] RDX: 0000000000000003 RSI: ffffffffc0c6dc78 RDI: ffff88815e236ec8 
 [    511.505890] RBP: 0000000088bb3f83 R08: 0000000088bb3f83 R09: ffff88811b3f8cb1 
 [    511.510918] R10: ffffed102367f196 R11: 0000000000000001 R12: 0000000000000002 
 [    511.515367] R13: 0000000088bb3f83 R14: ffff88815e236ec8 R15: ffff88815e2366d0 
 [    511.519732] FS:    0000000000000000(0000) GS:ffff88841d380000(0000) knlGS:0000000000000000 
 [    511.524423] CS:    0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
 [    511.528664] CR2: 00000000020d58d8 CR3: 000000010d002000 CR4: 00000000003506e0 
 [    511.533051] Call Trace: 
 [    511.536873]    <TASK> 
 [    511.540734]    ceph_fill_inode+0x6ae/0x1eb0 [ceph] 
 [    511.544925]    ? __lock_acquire+0x841/0x2d00 
 [    511.549004]    ? ceph_queue_inode_work+0x100/0x100 [ceph] 
 [    511.553267]    ? lock_acquire+0x157/0x3d0 
 [    511.557351]    ? lock_acquire+0x167/0x3d0 
 [    511.561373]    ceph_fill_trace+0x474/0x1510 [ceph] 
 [    511.565495]    ? rcu_read_lock_sched_held+0x3b/0x70 
 [    511.569683]    ? trace_contention_end+0xbc/0x110 
 [    511.573717]    ? preempt_count_sub+0x14/0xc0 
 [    511.577614]    ? __mutex_lock+0x1bf/0xc00 
 [    511.581557]    ? ceph_fill_inode+0x1eb0/0x1eb0 [ceph] 
 [    511.585605]    ? mds_dispatch+0xb05/0x24e0 [ceph] 
 [    511.589539]    ? lock_release+0x430/0x430 
 [    511.593380]    ? mutex_lock_io_nested+0xae0/0xae0 
 [    511.597371]    ? ceph_ino_compare+0x60/0x60 [ceph] 
 [    511.601493]    ? iget5_locked+0x2a/0xa0 
 [    511.605914]    ? downgrade_write+0xc6/0x240 
 [    511.612667]    ? rwsem_mark_wake+0x4f0/0x4f0 
 [    511.618237]    ? down_write_killable+0x120/0x120 
 [    511.626448]    mds_dispatch+0xb50/0x24e0 [ceph] 
 [    511.632358]    ? ceph_mdsc_handle_mdsmap+0x3a0/0x3a0 [ceph] 
 [    511.638367]    ? wait_for_completion_io_timeout+0x20/0x20 
 [    511.644583]    ? tcp_recv_timestamp+0x350/0x350 
 [    511.650333]    ceph_con_process_message+0xd3/0x220 [libceph] 
 [    511.656150]    ? inet_recvmsg+0xcf/0x310 
 [    511.661772]    process_message+0x1b/0x1d0 [libceph] 
 [    511.667354]    ceph_con_v2_try_read+0x1a2b/0x2ae0 [libceph] 
 [    511.673029]    ? populate_out_iter+0x1850/0x1850 [libceph] 
 [    511.678532]    ? lock_is_held_type+0xe3/0x140 
 [    511.683466]    ? rcu_read_lock_sched_held+0x3b/0x70 
 [    511.688375]    ? trace_contention_end+0xbc/0x110 
 [    511.693147]    ? preempt_count_sub+0x14/0xc0 
 [    511.697805]    ? __mutex_lock+0x1bf/0xc00 
 [    511.702350]    ? reacquire_held_locks+0x270/0x270 
 [    511.706933]    ? ceph_con_workfn+0x35/0x8c0 [libceph] 
 [    511.710987]    ? lockdep_hardirqs_on_prepare+0x220/0x220 
 [    511.714359]    ? mutex_lock_io_nested+0xae0/0xae0 
 [    511.717437]    ? lock_downgrade+0x380/0x380 
 [    511.720385]    ceph_con_workfn+0x53e/0x8c0 [libceph] 
 [    511.723670]    process_one_work+0x515/0x980 
 [    511.726819]    ? pwq_dec_nr_in_flight+0x100/0x100 
 [    511.729886]    ? rwlock_bug.part.0+0x60/0x60 
 [    511.732879]    worker_thread+0x8c/0x610 
 [    511.735768]    ? process_one_work+0x980/0x980 
 [    511.738687]    kthread+0x167/0x1a0 
 [    511.741470]    ? kthread_complete_and_exit+0x20/0x20 
 [    511.744490]    ret_from_fork+0x22/0x30 
 [    511.747410]    </TASK> 
 [    511.749893] irq event stamp: 972595 
 [    511.752690] hardirqs last    enabled at (972609): [<ffffffff8a2033de>] __up_console_sem+0x5e/0x70 
 [    511.756306] hardirqs last disabled at (972622): [<ffffffff8a2033c3>] __up_console_sem+0x43/0x70 
 [    511.759924] softirqs last    enabled at (972292): [<ffffffff8a13b8f5>] __irq_exit_rcu+0x135/0x1b0 
 [    511.763559] softirqs last disabled at (972287): [<ffffffff8a13b8f5>] __irq_exit_rcu+0x135/0x1b0 
 [    511.767131] ---[ end trace 0000000000000000 ]--- 
 </pre> 

 That is: 

 <pre> 
       /* the MDS should have revoked these caps */ 
       WARN_ON_ONCE(issued & (CEPH_CAP_FILE_EXCL | 
                              CEPH_CAP_FILE_RD | 
                              CEPH_CAP_FILE_WR | 
                              CEPH_CAP_FILE_LAZYIO)); 
 </pre> 

 So it looks like the MDS changed the truncate_seq without revoking caps properly?

Back