Project

General

Profile

Bug #56693

MDS changed truncate_seq with client caps still outstanding

Added by Jeff Layton over 1 year ago. Updated 7 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Opening as a kernel bug, but this may be a MDS problem. I hit this warning today testing with the current testing branch (commit 7515a529973fc)

[  511.413581] WARNING: CPU: 3 PID: 96 at fs/ceph/inode.c:745 ceph_fill_file_size+0x284/0x290 [ceph]
[  511.419352] Modules linked in: ceph(E) libceph(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) rfkill(E) nf_tables(E) nfnetlink(E) cachefiles(E) fscache(E) netfs(E) sunrpc(E) i2c_i801(E) i2c_smbus(E) joydev(E) virtio_balloon(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) lpc_ich(E) intel_rapl_msr(E) intel_rapl_common(E) fuse(E) zram(E) xfs(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) virtio_gpu(E) virtio_dma_buf(E) drm_shmem_helper(E) drm_kms_helper(E) virtio_net(E) net_failover(E) virtio_blk(E) virtio_console(E) failover(E) drm(E) qemu_fw_cfg(E)
[  511.453023] CPU: 3 PID: 96 Comm: kworker/3:1 Tainted: G            E     5.19.0-rc6+ #320
[  511.461188] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014
[  511.469479] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[  511.475692] RIP: 0010:ceph_fill_file_size+0x284/0x290 [ceph]
[  511.480806] Code: 8b 44 24 04 e9 ed fe ff ff 48 8d bb a8 07 00 00 e8 91 58 8d c9 8b 83 a8 07 00 00 85 c0 0f 85 93 fe ff ff 31 c0 e9 a4 fe ff ff <0f> 0b e9 52 fe ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41
[  511.491849] RSP: 0018:ffff888101787518 EFLAGS: 00010206
[  511.496261] RAX: 0000000000000000 RBX: ffff88815e236680 RCX: dffffc0000000000
[  511.500979] RDX: 0000000000000003 RSI: ffffffffc0c6dc78 RDI: ffff88815e236ec8
[  511.505890] RBP: 0000000088bb3f83 R08: 0000000088bb3f83 R09: ffff88811b3f8cb1
[  511.510918] R10: ffffed102367f196 R11: 0000000000000001 R12: 0000000000000002
[  511.515367] R13: 0000000088bb3f83 R14: ffff88815e236ec8 R15: ffff88815e2366d0
[  511.519732] FS:  0000000000000000(0000) GS:ffff88841d380000(0000) knlGS:0000000000000000
[  511.524423] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  511.528664] CR2: 00000000020d58d8 CR3: 000000010d002000 CR4: 00000000003506e0
[  511.533051] Call Trace:
[  511.536873]  <TASK>
[  511.540734]  ceph_fill_inode+0x6ae/0x1eb0 [ceph]
[  511.544925]  ? __lock_acquire+0x841/0x2d00
[  511.549004]  ? ceph_queue_inode_work+0x100/0x100 [ceph]
[  511.553267]  ? lock_acquire+0x157/0x3d0
[  511.557351]  ? lock_acquire+0x167/0x3d0
[  511.561373]  ceph_fill_trace+0x474/0x1510 [ceph]
[  511.565495]  ? rcu_read_lock_sched_held+0x3b/0x70
[  511.569683]  ? trace_contention_end+0xbc/0x110
[  511.573717]  ? preempt_count_sub+0x14/0xc0
[  511.577614]  ? __mutex_lock+0x1bf/0xc00
[  511.581557]  ? ceph_fill_inode+0x1eb0/0x1eb0 [ceph]
[  511.585605]  ? mds_dispatch+0xb05/0x24e0 [ceph]
[  511.589539]  ? lock_release+0x430/0x430
[  511.593380]  ? mutex_lock_io_nested+0xae0/0xae0
[  511.597371]  ? ceph_ino_compare+0x60/0x60 [ceph]
[  511.601493]  ? iget5_locked+0x2a/0xa0
[  511.605914]  ? downgrade_write+0xc6/0x240
[  511.612667]  ? rwsem_mark_wake+0x4f0/0x4f0
[  511.618237]  ? down_write_killable+0x120/0x120
[  511.626448]  mds_dispatch+0xb50/0x24e0 [ceph]
[  511.632358]  ? ceph_mdsc_handle_mdsmap+0x3a0/0x3a0 [ceph]
[  511.638367]  ? wait_for_completion_io_timeout+0x20/0x20
[  511.644583]  ? tcp_recv_timestamp+0x350/0x350
[  511.650333]  ceph_con_process_message+0xd3/0x220 [libceph]
[  511.656150]  ? inet_recvmsg+0xcf/0x310
[  511.661772]  process_message+0x1b/0x1d0 [libceph]
[  511.667354]  ceph_con_v2_try_read+0x1a2b/0x2ae0 [libceph]
[  511.673029]  ? populate_out_iter+0x1850/0x1850 [libceph]
[  511.678532]  ? lock_is_held_type+0xe3/0x140
[  511.683466]  ? rcu_read_lock_sched_held+0x3b/0x70
[  511.688375]  ? trace_contention_end+0xbc/0x110
[  511.693147]  ? preempt_count_sub+0x14/0xc0
[  511.697805]  ? __mutex_lock+0x1bf/0xc00
[  511.702350]  ? reacquire_held_locks+0x270/0x270
[  511.706933]  ? ceph_con_workfn+0x35/0x8c0 [libceph]
[  511.710987]  ? lockdep_hardirqs_on_prepare+0x220/0x220
[  511.714359]  ? mutex_lock_io_nested+0xae0/0xae0
[  511.717437]  ? lock_downgrade+0x380/0x380
[  511.720385]  ceph_con_workfn+0x53e/0x8c0 [libceph]
[  511.723670]  process_one_work+0x515/0x980
[  511.726819]  ? pwq_dec_nr_in_flight+0x100/0x100
[  511.729886]  ? rwlock_bug.part.0+0x60/0x60
[  511.732879]  worker_thread+0x8c/0x610
[  511.735768]  ? process_one_work+0x980/0x980
[  511.738687]  kthread+0x167/0x1a0
[  511.741470]  ? kthread_complete_and_exit+0x20/0x20
[  511.744490]  ret_from_fork+0x22/0x30
[  511.747410]  </TASK>
[  511.749893] irq event stamp: 972595
[  511.752690] hardirqs last  enabled at (972609): [<ffffffff8a2033de>] __up_console_sem+0x5e/0x70
[  511.756306] hardirqs last disabled at (972622): [<ffffffff8a2033c3>] __up_console_sem+0x43/0x70
[  511.759924] softirqs last  enabled at (972292): [<ffffffff8a13b8f5>] __irq_exit_rcu+0x135/0x1b0
[  511.763559] softirqs last disabled at (972287): [<ffffffff8a13b8f5>] __irq_exit_rcu+0x135/0x1b0
[  511.767131] ---[ end trace 0000000000000000 ]---

That is:

      /* the MDS should have revoked these caps */
      WARN_ON_ONCE(issued & (CEPH_CAP_FILE_EXCL |
                             CEPH_CAP_FILE_RD |
                             CEPH_CAP_FILE_WR |
                             CEPH_CAP_FILE_LAZYIO));

So it looks like the MDS changed the truncate_seq without revoking caps properly?

History

#1 Updated by Jeff Layton over 1 year ago

  • Subject changed from MDS changed truncate_seq changing with client caps still outstanding to MDS changed truncate_seq with client caps still outstanding

#2 Updated by Jeff Layton over 1 year ago

  • Description updated (diff)

#3 Updated by Xiubo Li over 1 year ago

  • Assignee set to Xiubo Li

Hi Jeff,

BTW, is this reproduced in the testing branch ?

#4 Updated by Xiubo Li over 1 year ago

  • Status changed from New to In Progress

#5 Updated by Xiubo Li over 1 year ago

I went through the MDS side code and all cases worked well as expected. I just guess it could be the async create's issue, but need more time to go through the kernel code.

#6 Updated by Xiubo Li over 1 year ago

In the testing branch I added one tmp commit1 to debug this to get more logs:

[1] https://github.com/ceph/ceph-client/commit/e622eb8ab8b962ab91750f25490a092df9687e30

#7 Updated by Xiubo Li over 1 year ago

  • Status changed from In Progress to Need More Info

#8 Updated by Xiubo Li 7 months ago

Locally I have reproduced the warning by using the following steps:

1, enable the dynamic debug logs for ceph module
2, mount two kclients: clientA and clientB
3, open one terminal to run the following command in clientA:

   $ while [ 1 ]; do echo ++++++; ll -h file; truncate -s 234 file; ll -h file; echo ------; done

4, open other two terminals to run the following command in clientA and clientB:
   $ while [ 1 ]; do dd if=/dev/urandom of=file bs=128 count=8; echo =====; done

It's very easy to be reproduced.

Though the call trace is not exactly the same they should be the same issue:

10383 <4>[  940.233291] ------------[ cut here ]------------
10384 <4>[  940.233297] WARNING: CPU: 3 PID: 9182 at fs/ceph/inode.c:798 ceph_fill_file_size+0x507/0x5f0 [ceph]
10385 <4>[  940.233491] Modules linked in: ceph(E) libceph dns_resolver fscache netfs xsk_diag snd_seq_dummy snd_hrtimer qrtr rfkill intel_rapl_msr intel_rapl_common intel_pmc_core snd_intel8x0 rapl snd_ac97_codec pktcdvd ac97_bus       snd_seq snd_seq_device snd_pcm snd_timer pcspkr snd i2c_piix4 soundcore xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg ata_generic vmwgfx crct10dif_pclmul crc32_pclmul crc32c_intel drm_ttm_helper ttm video drm_kms_helper ahci vi      rtio_net net_failover ata_piix libahci failover ghash_clmulni_intel wmi drm libata e1000 serio_raw dm_mirror dm_region_hash dm_log dm_mod fuse
10386 <4>[  940.233717] CPU: 3 PID: 9182 Comm: kworker/3:22 Kdump: loaded Tainted: G        W   E      6.5.0-rc4+ #11
10387 <4>[  940.233971] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
10390 <4>[  940.233990] Workqueue: ceph-msgr ceph_con_workfn [libceph]
10407 <4>[  940.234398] RIP: 0010:ceph_fill_file_size+0x507/0x5f0 [ceph]
10410 <4>[  940.234673] Code: 8d 7f 28 e8 cb b6 d7 dd 49 8b 57 28 49 89 e9 4c 89 ee 4c 8b 44 24 28 48 c7 c1 e0 f7 7a c1 48 c7 c7 20 cf 7a c1 e8 29 5c 99 dd <0f> 0b 48 83 c4 18 e9 ee fb ff ff 4c 8d a3 d0 08 00 00 89 04 24 4c
10411 <4>[  940.234685] RSP: 0018:ffff88812d7c7910 EFLAGS: 00010246
10412 <4>[  940.234694] RAX: 0000000000000058 RBX: ffff88827b3a5640 RCX: 0000000000000027
10413 <4>[  940.234699] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff888462fffb88
10414 <4>[  940.234703] RBP: 00000000000000ea R08: ffffffff9f2418ae R09: ffffed108c5fff71
10415 <4>[  940.234708] R10: ffff888462fffb8b R11: 0000000000000001 R12: 0000000000000016
10416 <4>[  940.234714] R13: ffff8881493b8000 R14: ffff88827b3a5ef8 R15: ffff88818270f400
10417 <4>[  940.234722] FS:  0000000000000000(0000) GS:ffff888462e00000(0000) knlGS:0000000000000000
10418 <4>[  940.234727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
10419 <4>[  940.234733] CR2: 00007f90c5fc74f4 CR3: 000000014184c005 CR4: 00000000000706e0
10420 <4>[  940.234737] Call Trace:
10421 <4>[  940.234751]  <TASK>
10422 <4>[  940.234760]  ? __warn+0xa1/0x200
10423 <4>[  940.234792]  ? ceph_fill_file_size+0x507/0x5f0 [ceph]
10424 <4>[  940.234935]  ? report_bug+0x1c7/0x1d0
10425 <4>[  940.234950]  ? handle_bug+0x3c/0x70
10426 <4>[  940.234958]  ? exc_invalid_op+0x14/0x50
10427 <4>[  940.234964]  ? asm_exc_invalid_op+0x16/0x20
10428 <4>[  940.235026]  ? irq_work_claim+0x1e/0x40
10429 <4>[  940.235036]  ? ceph_fill_file_size+0x507/0x5f0 [ceph]
10430 <4>[  940.235148]  ceph_handle_caps+0xeef/0x11a0 [ceph]
10431 <4>[  940.235291]  ? __pfx_ceph_handle_caps+0x10/0x10 [ceph]
10432 <4>[  940.235436]  ? lock_acquired+0x59/0x190
10433 <4>[  940.235445]  ? rcu_is_watching+0x1f/0x50
10434 <4>[  940.235454]  ? __mutex_lock+0x1116/0x1270
10435 <4>[  940.235464]  ? rcu_is_watching+0x1f/0x50
10436 <4>[  940.235469]  ? lock_release+0x4d/0x1a0
10437 <4>[  940.235484]  ? __mutex_unlock_slowpath+0x118/0x3f0
10438 <4>[  940.235499]  ? __pfx___mutex_unlock_slowpath+0x10/0x10
10439 <4>[  940.235527]  mds_dispatch+0x1fe/0x270 [ceph]
10442 <4>[  940.235638]  ceph_con_process_message+0xd9/0x240 [libceph]
10444 <4>[  940.235881]  ? __pfx___mutex_lock+0x10/0x10
10445 <4>[  940.235891]  ceph_con_v1_try_read+0x6bd/0xc40 [libceph]
10447 <4>[  940.236061]  ? __pfx_ceph_con_v1_try_read+0x10/0x10 [libceph]
10448 <4>[  940.236241]  ? lock_unpin_lock+0x77/0x240
10449 <4>[  940.236252]  ceph_con_workfn+0x240/0x5c0 [libceph]
10450 <4>[  940.236448]  process_one_work+0x542/0x9e0
10451 <4>[  940.236476]  ? __pfx_process_one_work+0x10/0x10
10452 <4>[  940.236489]  ? __list_add_valid+0x35/0xc0
10453 <4>[  940.236526]  worker_thread+0x2dd/0x670
10454 <4>[  940.236541]  ? __kthread_parkme+0xc9/0xe0
10455 <4>[  940.236573]  ? __pfx_worker_thread+0x10/0x10
10456 <4>[  940.236580]  kthread+0x198/0x1e0
10457 <4>[  940.236590]  ? __pfx_kthread+0x10/0x10
10458 <4>[  940.236599]  ret_from_fork+0x30/0x50
10459 <4>[  940.236613]  ? __pfx_kthread+0x10/0x10
10460 <4>[  940.236622]  ret_from_fork_asm+0x1b/0x30
10461 <4>[  940.236639]  </TASK>
10462 <4>[  940.236645] irq event stamp: 0
10463 <4>[  940.236649] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
10464 <4>[  940.236660] hardirqs last disabled at (0): [<ffffffff9ef65ab4>] copy_process+0xd84/0x2de0
10465 <4>[  940.236673] softirqs last  enabled at (0): [<ffffffff9ef65ab4>] copy_process+0xd84/0x2de0
10466 <4>[  940.236685] softirqs last disabled at (0): [<0000000000000000>] 0x0
10467 <4>[  940.236692] ---[ end trace 0000000000000000 ]---

With the debug logs from https://tracker.ceph.com/issues/56693#note-6:

 4609 <7>[  940.233129] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] handle_cap_trunc: 000000000b60d9e5 10000000000.fffffffffffffffe mds0 seq 102 to 234 truncate seq 22
 4610 <7>[  940.233137] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size: size 896 -> 234
 4611 <7>[  940.233143] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size: truncate_seq 21 -> 22
 4612 <3>[  940.233149] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size: 000000000b60d9e5 ino 10000000000.fffffffffffffffe already issued pAsLsXsFcwb, newcaps -
 4613 <3>[  940.233157] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size:  truncate_seq 21 -> 22
 4614 <3>[  940.233162] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size:   size 896 -> 234
 4615 <7>[  940.236704] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size: truncate_size 234 -> 234, encrypted 0

We can see that the issued caps were Fcwb, which means when kclient received the truncate notification but it still held the Fw caps, this could trigger the WARNING call trace from ceph_fill_file_size() as above.

Why ?

15086 <7>[  943.045321] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4237] __ceph_caps_issued: 0000000053c9ee0d 10000000000.fffffffffffffffe cap 00000000d012361d issued pAsLsXsFrw
15087 <7>[  943.045360] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4237] __ceph_setattr: 0000000053c9ee0d 10000000000.fffffffffffffffe issued pAsLsXsFrw
15088 <7>[  943.045372] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4237] __ceph_setattr: 0000000053c9ee0d 10000000000.fffffffffffffffe size 256 -> 234
15089 <7>[  943.045378] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __ceph_getxattr: 000000000b60d9e5 10000000000.fffffffffffffffe name 'security.capability' ver=1 index_ver=1
15090 <7>[  943.045381] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4237] __ceph_setattr: 0000000053c9ee0d 10000000000.fffffffffffffffe mtime 1693971197.539042304 -> 1693971197.643572630
15091 <7>[  943.045388] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4237] __ceph_setattr: 0000000053c9ee0d 10000000000.fffffffffffffffe ctime 1693971197.539042304 -> 1693971197.643572630 (ignored)                                  

From the above logs we can see that when another kclient try to truncate the file size to 234, in the MDS side it will try to revoke the Frw caps from both kclients:

15143 <7>[  943.047707] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_handle_caps: from mds0
15144 <7>[  943.047727] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_handle_caps:  op revoke ino 10000000000.fffffffffffffffe inode 000000000b60d9e5                                                                         
15145 <7>[  943.047742] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_handle_caps:  mds0 seq 212 cap seq 155
15146 <7>[  943.047755] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __ceph_caps_issued: 000000000b60d9e5 10000000000.fffffffffffffffe cap 00000000489973b2 issued pAsLsXsFrw
15147 <7>[  943.047769] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] handle_cap_grant: 000000000b60d9e5 10000000000.fffffffffffffffe cap 00000000489973b2 mds0 seq 155 pAsLsXs
15148 <7>[  943.047785] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] handle_cap_grant:  size 234 max_size 4194304, i_size 640
15149 <7>[  943.047797] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __ceph_caps_issued: 000000000b60d9e5 10000000000.fffffffffffffffe cap 00000000489973b2 issued pAsLsXsFrw
15150 <7>[  943.047813] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] handle_cap_grant: 000000000b60d9e5 10000000000.fffffffffffffffe mode 0100644 uid.gid 1000.1000
15151 <7>[  943.047829] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] handle_cap_grant:  my wanted = pAsxXsxFxwb, used = Fw, dirty Fw
15152 <7>[  943.047850] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] handle_cap_grant: revocation: pAsLsXsFrw -> pAsLsXs (revoking Frw)
15153 <7>[  943.047862] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __ceph_caps_issued: 000000000b60d9e5 10000000000.fffffffffffffffe cap 00000000489973b2 issued pAsLsXs
15154 <7>[  943.047879] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_check_caps: 000000000b60d9e5 10000000000.fffffffffffffffe file_want pAsxXsxFxwb used Fw dirty Fw flushing - issued pAsLsXs revoking Frw retain pAsxLsxXs      xFsxcrwbl  AUTHONLY NOINVAL
15155 <7>[  943.047897] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_check_caps:  mds0 cap 00000000489973b2 used Fw issued pAsLsXs implemented pAsLsXsFrw revoking Frw
15156 <7>[  943.047914] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __mark_caps_flushing: flushing Fw, flushing_caps - -> Fw
15157 <7>[  943.047923] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __mark_caps_flushing: 000000000b60d9e5 10000000000.fffffffffffffffe now !dirty

But due to this kclient was still holding the Fw caps for writing it didn't release Fw caps immediately, and later just when the write finished and released the Fw caps the revocation completed:

15191 <7>[  943.141637] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_write_iter: 000000000b60d9e5 10000000000.fffffffffffffffe 640~128  dropping cap refs on Fw
15192 <7>[  943.141647] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __ceph_put_cap_refs: 000000000b60d9e5 10000000000.fffffffffffffffe had Fw last
15193 <7>[  943.141659] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __ceph_caps_issued: 000000000b60d9e5 10000000000.fffffffffffffffe cap 00000000489973b2 issued pAsLsXsFcb
15194 <7>[  943.141670] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_check_caps: 000000000b60d9e5 10000000000.fffffffffffffffe file_want pAsxXsxFxwb used - dirty Fw flushing Fw issued pAsLsXsFcb revoking Fw retain pAsxLsx      XsxFsxcrwbl 
15195 <7>[  943.141683] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_check_caps:  mds0 cap 00000000489973b2 used - issued pAsLsXsFcb implemented pAsLsXsFcwb revoking Fw
15196 <7>[  943.141690] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_check_caps: completed revocation of Fw
15197 <7>[  943.141697] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __mark_caps_flushing: flushing Fw, flushing_caps Fw -> Fw
15198 <7>[  943.141704] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __mark_caps_flushing: 000000000b60d9e5 10000000000.fffffffffffffffe now !dirty
15199 <7>[  943.141712] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __prep_cap: 000000000b60d9e5 10000000000.fffffffffffffffe cap 00000000489973b2 session 00000000a6417cdd pAsLsXsFcwb -> pAsLsXsFcb (revoking Fw)
15200 <7>[  943.141800] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] encode_cap_msg: update 1 10000000000 caps pAsLsXsFcb wanted pAsxXsxFxwb dirty Fw seq 156/142 tid 39/38 mseq 0 follows 1 size 768/0 xattr_ver 1844674407208253      7735 xattr_len 0

But this kclient will still hold the Fw caps' reference locally and waiting for the flush caps ack from MDS. While the MDS just received the Fw caps releasing msg it just assume the revocation has totally finished and then continued the truncation and sent out the flush caps ack msg to kclients later or during this:

15482 <7>[  943.259693] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4237] handle_cap_trunc: 0000000053c9ee0d 10000000000.fffffffffffffffe mds0 seq 219 to 234 truncate seq 32
15483 <7>[  943.259705] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4237] ceph_fill_file_size: truncate_size 234 -> 234, encrypted 0
15484 <7>[  943.259704] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_handle_caps: from mds0
15485 <7>[  943.259723] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_handle_caps:  op trunc ino 10000000000.fffffffffffffffe inode 000000000b60d9e5
15486 <7>[  943.259739] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_handle_caps:  mds0 seq 215 cap seq 156
15487 <7>[  943.259755] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] __ceph_caps_issued: 000000000b60d9e5 10000000000.fffffffffffffffe cap 00000000489973b2 issued pAsLsXsFcb
15488 <7>[  943.259771] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] handle_cap_trunc: 000000000b60d9e5 10000000000.fffffffffffffffe mds0 seq 156 to 234 truncate seq 32
15489 <7>[  943.259795] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size: size 768 -> 234
15490 <7>[  943.259806] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size: truncate_seq 31 -> 32
15491 <3>[  943.259820] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size: 000000000b60d9e5 ino 10000000000.fffffffffffffffe already issued pAsLsXsFcwb, newcaps -
15492 <3>[  943.259834] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size:  truncate_seq 31 -> 32
15493 <3>[  943.259846] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] ceph_fill_file_size:   size 768 -> 234

But the cap flushing ack hasn't been received for 4247 kclient just before it received the truncate notification, so __ceph_caps_dirty() will continue return Fw:

788 static inline int __ceph_caps_dirty(struct ceph_inode_info *ci)                                                                                                                                                                   
789 {
790 return ci->i_dirty_caps | ci->i_flushing_caps;
791 }

And in handle_cap_trunc(), it will pass the issued |= implemented | dirty; to the ceph_fill_file_size(), which means in ceph_fill_file_size() the issued will include the dirty caps:

4024 static bool handle_cap_trunc(struct inode *inode,
4025                              struct ceph_mds_caps *trunc,                                                                                                                                                                         
4026                              struct ceph_mds_session *session,
4027                              struct cap_extra_info *extra_info)
4028 {
4029         struct ceph_inode_info *ci = ceph_inode(inode);
4030         struct ceph_client *cl = ceph_inode_to_client(inode);
4031         int mds = session->s_mds;
4032         int seq = le32_to_cpu(trunc->seq);
4033         u32 truncate_seq = le32_to_cpu(trunc->truncate_seq);
4034         u64 truncate_size = le64_to_cpu(trunc->truncate_size);
4035         u64 size = le64_to_cpu(trunc->size);
4036         int implemented = 0;
4037         int dirty = __ceph_caps_dirty(ci);
4038         int issued = __ceph_caps_issued(ceph_inode(inode), &implemented);
4039         bool queue_trunc = false;
4040 
4041         lockdep_assert_held(&ci->i_ceph_lock);
4042 
4043         issued |= implemented | dirty;
4044 
4045         /*
4046          * If there is at least one crypto block then we'll trust
4047          * fscrypt_file_size. If the real length of the file is 0, then
4048          * ignore it (it has probably been truncated down to 0 by the MDS).
4049          */
4050         if (IS_ENCRYPTED(inode) && size)
4051                 size = extra_info->fscrypt_file_size;
4052 
4053         doutc(cl, "%p %llx.%llx mds%d seq %d to %lld truncate seq %d\n",
4054               inode, ceph_vinop(inode), mds, seq, truncate_size, truncate_seq);
4055         queue_trunc = ceph_fill_file_size(inode, issued,
4056                                           truncate_seq, truncate_size,
4057                                           size, 0);
4058         return queue_trunc;
4059 }

And later just after the WARNING call trace the cap flushing ack came and cleaned the Fw caps from ci->i_flushing_caps, but it was late:

15633 <7>[  943.264564] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] handle_cap_flush_ack: 000000000b60d9e5 10000000000.fffffffffffffffe mds0 seq 156 on Fw cleaned Fw, flushing Fw -> -
15634 <7>[  943.264579] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] handle_cap_flush_ack:  000000000b60d9e5 10000000000.fffffffffffffffe now !flushing
15635 <7>[  943.264591] ceph: [3c66cbb8-ff2a-45cb-acf4-5175f776d29c 4247] handle_cap_flush_ack:  000000000b60d9e5 10000000000.fffffffffffffffe now clean

The above is only for the Fw caps, and it will be the same with the Fx caps in the single client case.

This is okay and won't cause any issue. I will fix the kclient code by removing the WARNING check.

#9 Updated by Xiubo Li 7 months ago

  • Status changed from Need More Info to In Progress

#10 Updated by Xiubo Li 7 months ago

  • Status changed from In Progress to Fix Under Review

Also available in: Atom PDF