Project

General

Profile

Actions

Bug #59343

closed

Bug #54460: snaptest-multiple-capsnaps.sh test failure

qa: fs/snaps/snaptest-multiple-capsnaps.sh failed

Added by Xiubo Li about 1 year ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

100%

Source:
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/yuriw-2023-04-05_14:51:06-fs-wip-yuri5-testing-2023-04-04-0814-distro-default-smithi/7232845/

2023-04-05T20:43:29.556 INFO:tasks.workunit.client.0.smithi073.stderr:+ echo asdf
2023-04-05T20:43:29.557 INFO:tasks.workunit.client.0.smithi073.stderr:+ mkdir .snap/1
2023-04-05T20:43:29.569 INFO:tasks.workunit.client.0.smithi073.stderr:+ chmod 777 a
2023-04-05T20:43:29.578 INFO:tasks.workunit.client.0.smithi073.stderr:+ mkdir .snap/2
2023-04-05T20:43:29.592 INFO:tasks.workunit.client.0.smithi073.stderr:+ echo qwer
2023-04-05T20:43:29.604 INFO:tasks.workunit.client.0.smithi073.stderr:+ mkdir .snap/3
2023-04-05T20:43:29.612 INFO:tasks.workunit.client.0.smithi073.stderr:+ chmod 666 a
2023-04-05T20:43:29.614 INFO:tasks.workunit.client.0.smithi073.stderr:+ mkdir .snap/4
2023-04-05T20:43:29.635 INFO:tasks.workunit.client.0.smithi073.stderr:+ echo zxcv
2023-04-05T20:43:29.938 INFO:tasks.workunit.client.0.smithi073.stderr:+ mkdir .snap/5
2023-04-05T20:43:30.118 INFO:tasks.workunit.client.0.smithi073.stderr:+ ls -al .snap/1/a .snap/2/a .snap/3/a .snap/4/a .snap/5/a
2023-04-05T20:43:30.154 INFO:tasks.workunit.client.0.smithi073.stdout:-rw-rw-r--. 1 ubuntu ubuntu 5 Apr  5 20:43 .snap/1/a
2023-04-05T20:43:30.155 INFO:tasks.workunit.client.0.smithi073.stdout:-rwxrwxrwx. 1 ubuntu ubuntu 5 Apr  5 20:43 .snap/2/a
2023-04-05T20:43:30.155 INFO:tasks.workunit.client.0.smithi073.stdout:-rwxrwxrwx. 1 ubuntu ubuntu 5 Apr  5 20:43 .snap/3/a
2023-04-05T20:43:30.155 INFO:tasks.workunit.client.0.smithi073.stdout:-rwxrwxrwx. 1 ubuntu ubuntu 5 Apr  5 20:43 .snap/4/a
2023-04-05T20:43:30.155 INFO:tasks.workunit.client.0.smithi073.stdout:-rw-rw-rw-. 1 ubuntu ubuntu 5 Apr  5 20:43 .snap/5/a
2023-04-05T20:43:30.155 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep asdf .snap/1/a
2023-04-05T20:43:30.158 INFO:tasks.workunit.client.0.smithi073.stdout:asdf
2023-04-05T20:43:30.160 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep 'Size: 5'
2023-04-05T20:43:30.173 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/1/a
2023-04-05T20:43:30.181 INFO:tasks.workunit.client.0.smithi073.stdout:  Size: 5                 Blocks: 1          IO Block: 4194304 regular file
2023-04-05T20:43:30.182 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep asdf .snap/2/a
2023-04-05T20:43:30.185 INFO:tasks.workunit.client.0.smithi073.stdout:asdf
2023-04-05T20:43:30.185 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/2/a
2023-04-05T20:43:30.185 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep 'Size: 5'
2023-04-05T20:43:30.188 INFO:tasks.workunit.client.0.smithi073.stdout:  Size: 5                 Blocks: 1          IO Block: 4194304 regular file
2023-04-05T20:43:30.188 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/2/a
2023-04-05T20:43:30.189 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep -- -rwxrwxrwx
2023-04-05T20:43:30.191 INFO:tasks.workunit.client.0.smithi073.stdout:Access: (0777/-rwxrwxrwx)  Uid: ( 1000/  ubuntu)   Gid: ( 1265/  ubuntu)
2023-04-05T20:43:30.191 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep qwer .snap/3/a
2023-04-05T20:43:30.194 INFO:tasks.workunit.client.0.smithi073.stdout:qwer
2023-04-05T20:43:30.195 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/3/a
2023-04-05T20:43:30.195 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep 'Size: 5'
2023-04-05T20:43:30.198 INFO:tasks.workunit.client.0.smithi073.stdout:  Size: 5                 Blocks: 1          IO Block: 4194304 regular file
2023-04-05T20:43:30.198 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/3/a
2023-04-05T20:43:30.199 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep -- -rwxrwxrwx
2023-04-05T20:43:30.201 INFO:tasks.workunit.client.0.smithi073.stdout:Access: (0777/-rwxrwxrwx)  Uid: ( 1000/  ubuntu)   Gid: ( 1265/  ubuntu)
2023-04-05T20:43:30.202 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep qwer .snap/4/a
2023-04-05T20:43:30.207 INFO:tasks.workunit.client.0.smithi073.stdout:qwer
2023-04-05T20:43:30.207 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/4/a
2023-04-05T20:43:30.208 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep 'Size: 5'
2023-04-05T20:43:30.211 INFO:tasks.workunit.client.0.smithi073.stdout:  Size: 5                 Blocks: 1          IO Block: 4194304 regular file
2023-04-05T20:43:30.214 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/4/a
2023-04-05T20:43:30.214 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep -- -rw-rw-rw-
2023-04-05T20:43:30.217 DEBUG:teuthology.orchestra.run:got remote process result: 1
2023-04-05T20:43:30.218 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...

The mode for .snap/4/a was mismatch, which should be rw-rw-rw-, but it was rwxrwxrwx


Subtasks 2 (0 open2 closed)

Bug #14557: snaps: failed snaptest-multiple-capsnaps.shDuplicate

Actions
Bug #58934: snaptest-git-ceph.sh failure with ceph-fuseDuplicateXiubo Li

Actions

Related issues 4 (1 open3 closed)

Related to CephFS - Bug #64209: snaptest-multiple-capsnaps.sh fails with "got remote process result: 1"NewVenky Shankar

Actions
Copied to CephFS - Backport #62043: pacific: qa: fs/snaps/snaptest-multiple-capsnaps.sh failedResolvedXiubo LiActions
Copied to CephFS - Backport #62044: reef: qa: fs/snaps/snaptest-multiple-capsnaps.sh failedResolvedXiubo LiActions
Copied to CephFS - Backport #62045: quincy: qa: fs/snaps/snaptest-multiple-capsnaps.sh failedResolvedXiubo LiActions
Actions #1

Updated by Xiubo Li about 1 year ago

  • Description updated (diff)
Actions #2

Updated by Xiubo Li about 1 year ago

  • Description updated (diff)
Actions #3

Updated by Xiubo Li about 1 year ago

https://pulpito.ceph.com/vshankar-2023-03-31_06:27:29-fs-wip-vshankar-testing-20230330.125245-testing-default-smithi/7228338/

But this just a little different:

2023-03-31T18:41:27.651 INFO:tasks.workunit.client.0.smithi047.stderr:+ set -e
2023-03-31T18:41:27.653 INFO:tasks.workunit.client.0.smithi047.stderr:+ echo asdf
2023-03-31T18:41:27.654 INFO:tasks.workunit.client.0.smithi047.stderr:+ mkdir .snap/1
2023-03-31T18:41:28.109 INFO:tasks.workunit.client.0.smithi047.stderr:+ chmod 777 a
2023-03-31T18:41:28.125 INFO:tasks.workunit.client.0.smithi047.stderr:+ mkdir .snap/2
2023-03-31T18:41:28.148 INFO:tasks.workunit.client.0.smithi047.stderr:+ echo qwer
2023-03-31T18:41:28.459 INFO:tasks.workunit.client.0.smithi047.stderr:+ mkdir .snap/3
2023-03-31T18:41:28.478 INFO:tasks.workunit.client.0.smithi047.stderr:+ chmod 666 a
2023-03-31T18:41:28.481 INFO:tasks.workunit.client.0.smithi047.stderr:+ mkdir .snap/4
2023-03-31T18:41:28.510 INFO:tasks.workunit.client.0.smithi047.stderr:+ echo zxcv
2023-03-31T18:41:28.728 INFO:tasks.workunit.client.0.smithi047.stderr:+ mkdir .snap/5
2023-03-31T18:41:28.953 INFO:tasks.workunit.client.0.smithi047.stderr:+ ls -al .snap/1/a .snap/2/a .snap/3/a .snap/4/a .snap/5/a
2023-03-31T18:41:29.005 INFO:tasks.workunit.client.0.smithi047.stdout:-rw-rw-r-- 1 ubuntu ubuntu 0 Mar 31 18:41 .snap/1/a
2023-03-31T18:41:29.005 INFO:tasks.workunit.client.0.smithi047.stdout:-rw-rw-r-- 1 ubuntu ubuntu 0 Mar 31 18:41 .snap/2/a
2023-03-31T18:41:29.005 INFO:tasks.workunit.client.0.smithi047.stdout:-rwxrwxrwx 1 ubuntu ubuntu 0 Mar 31 18:41 .snap/3/a
2023-03-31T18:41:29.006 INFO:tasks.workunit.client.0.smithi047.stdout:-rwxrwxrwx 1 ubuntu ubuntu 0 Mar 31 18:41 .snap/4/a
2023-03-31T18:41:29.006 INFO:tasks.workunit.client.0.smithi047.stdout:-rw-rw-rw- 1 ubuntu ubuntu 5 Mar 31 18:41 .snap/5/a
2023-03-31T18:41:29.007 INFO:tasks.workunit.client.0.smithi047.stderr:+ grep asdf .snap/1/a
2023-03-31T18:41:29.011 DEBUG:teuthology.orchestra.run:got remote process result: 1
2023-03-31T18:41:29.012 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...

Which is the same with a very old tracker https://tracker.ceph.com/issues/14557.

Actions #4

Updated by Venky Shankar about 1 year ago

Xiubo - I think this is https://tracker.ceph.com/issues/54460 which is assigned to Milind.

Actions #5

Updated by Xiubo Li about 1 year ago

Venky Shankar wrote:

Xiubo - I think this is https://tracker.ceph.com/issues/54460 which is assigned to Milind.

Yeah, correct.

Actions #6

Updated by Xiubo Li about 1 year ago

  • Status changed from New to Duplicate
  • Pull request ID set to 54460
Actions #7

Updated by Xiubo Li about 1 year ago

  • Parent task set to #54460
  • Pull request ID deleted (54460)
Actions #8

Updated by Xiubo Li 12 months ago

  • Status changed from Duplicate to Fix Under Review
  • Assignee set to Xiubo Li
  • Pull request ID set to 51383
Actions #9

Updated by Xiubo Li 12 months ago

This is a failure with libcephfs and have the client side logs:

vshankar-2023-04-06_04:14:11-fs-wip-vshankar-testing-20230330.105356-testing-default-smithi/7233705/teuthology.log:

2023-04-06T05:32:54.442 DEBUG:teuthology.orchestra.run.smithi043:workunit test fs/snaps/snaptest-multiple-capsnaps.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=bd7128551a85e374cf0114ccfe8dc6ad4b044903 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/snaptest-multiple-capsnaps.sh
2023-04-06T05:32:54.526 INFO:tasks.workunit.client.0.smithi043.stderr:+ set -e
2023-04-06T05:32:54.529 INFO:tasks.workunit.client.0.smithi043.stderr:+ echo asdf
2023-04-06T05:32:54.529 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/1
2023-04-06T05:32:54.580 INFO:tasks.workunit.client.0.smithi043.stderr:+ chmod 777 a
2023-04-06T05:32:54.592 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/2
2023-04-06T05:32:54.615 INFO:tasks.workunit.client.0.smithi043.stderr:+ echo qwer
2023-04-06T05:32:54.664 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/3
2023-04-06T05:32:54.685 INFO:tasks.workunit.client.0.smithi043.stderr:+ chmod 666 a
2023-04-06T05:32:54.688 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/4
2023-04-06T05:32:54.966 INFO:tasks.workunit.client.0.smithi043.stderr:+ echo zxcv
2023-04-06T05:32:54.970 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/5
2023-04-06T05:32:55.179 INFO:tasks.workunit.client.0.smithi043.stderr:+ ls -al .snap/1/a .snap/2/a .snap/3/a .snap/4/a .snap/5/a
2023-04-06T05:32:55.211 INFO:tasks.workunit.client.0.smithi043.stdout:-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  6 05:32 .snap/1/a
2023-04-06T05:32:55.211 INFO:tasks.workunit.client.0.smithi043.stdout:-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  6 05:32 .snap/2/a
2023-04-06T05:32:55.212 INFO:tasks.workunit.client.0.smithi043.stdout:-rwxrwxrwx 1 ubuntu ubuntu 5 Apr  6 05:32 .snap/3/a
2023-04-06T05:32:55.212 INFO:tasks.workunit.client.0.smithi043.stdout:-rwxrwxrwx 1 ubuntu ubuntu 5 Apr  6 05:32 .snap/4/a
2023-04-06T05:32:55.212 INFO:tasks.workunit.client.0.smithi043.stdout:-rw-rw-rw- 1 ubuntu ubuntu 5 Apr  6 05:32 .snap/5/a
2023-04-06T05:32:55.212 INFO:tasks.workunit.client.0.smithi043.stderr:+ grep asdf .snap/1/a
2023-04-06T05:32:55.218 DEBUG:teuthology.orchestra.run:got remote process result: 1
2023-04-06T05:32:55.218 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...
2023-04-06T05:32:55.218 DEBUG:teuthology.orchestra.run.smithi043:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2023-04-06T05:32:56.435 ERROR:teuthology.run_tasks:Saw exception from tasks.

From the client logs we can see that it successfully wrote contents asdf to the file a:

2023-04-06T05:32:54.523+0000 7fe1c9ff3700 10 client.4902 cap_delay_requeue on 0x10000009f75.head(faked_ino=0 nref=12 ll_ref=1 cap_refs={4=0,1024=1,4096=1,8192=1} open={2=1} mode=100664 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.526619+0000 change_attr=1 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50)
2023-04-06T05:32:54.523+0000 7fe1c9ff3700  3 client.4902 ll_write 0x7fe1580d3cd0 0~5 = 5
...

And then made the snapshot 1, but for the file a it just deferred flushing the snapcap to MDS because of the Fb caps was still hold by the buffer:

2023-04-06T05:32:54.527+0000 7fe1ca7f4700 10 client.4902 send_request client_request(unknown.0:267666 mksnap #0x10000009f74//1 2023-04-06T05:32:54.532244+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 to mds.0
2023-04-06T05:32:54.527+0000 7fe1ca7f4700  1 -- 192.168.0.1:0/2332599002 --> [v2:172.21.15.98:6836/1818983428,v1:172.21.15.98:6837/1818983428] -- client_request(unknown.0:267666 mksnap #0x10000009f74//1 2023-04-06T05:32:54.532244+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 -- 0x7fe1acf780e0 con 0x55ca2fbedcf0
2023-04-06T05:32:54.527+0000 7fe1ca7f4700 20 client.4902 awaiting reply|forward|kick on 0x7fe1ca7f1930
...
2023-04-06T05:32:54.571+0000 7fe1cd7fa700  1 -- 192.168.0.1:0/2332599002 <== mds.0 v2:172.21.15.98:6836/1818983428 517611 ==== client_snap(split split=0x10000009f74 tracelen=148) v1 ==== 188+0+0 (crc 0 0 0) 0x7fe1d01c2470 con 0x55ca2fbedcf0
2023-04-06T05:32:54.571+0000 7fe1cd7fa700 10 client.4902 handle_snap client_snap(split split=0x10000009f74 tracelen=148) v1
...
2023-04-06T05:32:54.571+0000 7fe1cd7fa700 10 client.4902 finish_cap_snap 0x10000009f75.head(faked_ino=0 nref=14 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.526619+0000 change_attr=1 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) capsnap 0x7fe1c45afb78 used Fcb
2023-04-06T05:32:54.571+0000 7fe1cd7fa700 10 client.4902 finish_cap_snap 0x10000009f75.head(faked_ino=0 nref=14 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.526619+0000 change_attr=1 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) cap_snap 0x7fe1c45afb78 used 9216 WRBUFFER, delaying
...
2023-04-06T05:32:54.571+0000 7fe1cd7fa700 20 client.4902 put_snap_realm 0x10000009f74 0x7fe1b94f4ef0 3 -> 2
2023-04-06T05:32:54.571+0000 7fe1cd7fa700  1 -- 192.168.0.1:0/2332599002 <== mds.0 v2:172.21.15.98:6836/1818983428 517612 ==== client_reply(???:267666 = 0 (0) Success safe) v1 ==== 357+0+0 (crc 0 0 0) 0x7fe1d0014740 con 0x55ca2fbedcf0

The echo qwer will do a truncate open, so it will truncate the file size to 0, please see the file size change druing open:

2023-04-06T05:32:54.603+0000 7fe1917fa700 10 client.4902 send_request client_request(unknown.0:267671 open #0x10000009f75 2023-04-06T05:32:54.605381+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 to mds.0
2023-04-06T05:32:54.603+0000 7fe1917fa700  1 -- 192.168.0.1:0/2332599002 --> [v2:172.21.15.98:6836/1818983428,v1:172.21.15.98:6837/1818983428] -- client_request(unknown.0:267671 open #0x10000009f75 2023-04-06T05:32:54.605381+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 -- 0x7fe15001e610 con 0x55ca2fbedcf0
2023-04-06T05:32:54.603+0000 7fe1917fa700 20 client.4902 awaiting reply|forward|kick on 0x7fe1917f7920
2023-04-06T05:32:54.603+0000 7fe1cd7fa700  1 -- 192.168.0.1:0/2332599002 <== mds.0 v2:172.21.15.98:6836/1818983428 517619 ==== client_caps(revoke ino 0x10000009f75 1 seq 3 caps=pAsxLsXsxFcb dirty=- wanted=pAsxXsxFxwb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2023-04-06T05:32:54.523706+0000 ctime 2023-04-06T05:32:54.523706+0000 change_attr 0) v12 ==== 260+0+0 (crc 0 0 0) 0x7fe1d0334c40 con 0x55ca2fbedcf0
2023-04-06T05:32:54.603+0000 7fe1cd7fa700 10 client.4902  mds.0 seq now 169108
2023-04-06T05:32:54.603+0000 7fe1cd7fa700  5 client.4902 handle_cap_grant on in 0x10000009f75 mds.0 seq 3 caps now pAsxLsXsxFcb was pAsxLsXsxFsxcrwb
2023-04-06T05:32:54.603+0000 7fe1cd7fa700 10 client.4902 update_inode_file_time 0x10000009f75.head(faked_ino=0 nref=19 ll_ref=5 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.589193+0000 change_attr=2 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=AxFw objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) pAsxLsXsxFsxcrwb ctime 2023-04-06T05:32:54.523706+0000 mtime 2023-04-06T05:32:54.523706+0000
...
2023-04-06T05:32:54.607+0000 7fe1cd7fa700 20 client.4902 put_inode on 0x10000009f75.head(faked_ino=0 nref=19 ll_ref=5 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.589193+0000 change_attr=2 caps=pAsxLsXsxFcb(0=pAsxLsXsxFcb) objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) n = 1
2023-04-06T05:32:54.607+0000 7fe1cd7fa700  1 -- 192.168.0.1:0/2332599002 <== mds.0 v2:172.21.15.98:6836/1818983428 517621 ==== client_reply(???:267671 = 0 (0) Success) v1 ==== 513+0+0 (crc 0 0 0) 0x7fe1d01c2470 con 0x55ca2fbedcf0
2023-04-06T05:32:54.607+0000 7fe1cd7fa700 20 client.4902 handle_client_reply got a reply. Safe:0 tid 267671
...
2023-04-06T05:32:54.607+0000 7fe1cd7fa700 10 client.4902 update_inode_file_time 0x10000009f75.head(faked_ino=0 nref=19 ll_ref=5 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.589193+0000 change_attr=2 caps=pAsxLsXsxFcb(0=pAsxLsXsxFcb) objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) pAsxLsXsxFcb ctime 2023-04-06T05:32:54.605381+0000 mtime 2023-04-06T05:32:54.605381+0000
2023-04-06T05:32:54.607+0000 7fe1cd7fa700 10 client.4902 size 5 -> 0
2023-04-06T05:32:54.607+0000 7fe1cd7fa700 10 client.4902 truncate_seq 1 -> 2
2023-04-06T05:32:54.607+0000 7fe1cd7fa700 10 client.4902 _invalidate_inode_cache 0x10000009f75.head(faked_ino=0 nref=19 ll_ref=5 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=2 caps=pAsxLsXsxFcb(0=pAsxLsXsxFcb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) 0~5

AND then finally the buffer was flused and then the dirty snapcap was flushed to MDS, but it was too late and the file size has been set to size=0/4194304:

2023-04-06T05:32:54.611+0000 7fe192ffd700 10 client.4902  nothing to flush
2023-04-06T05:32:54.611+0000 7fe192ffd700 20 client.4902 put_inode on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) n = 1
2023-04-06T05:32:54.611+0000 7fe192ffd700 10 client.4902 waiting for WRBUFFER to get dropped
2023-04-06T05:32:54.655+0000 7fe1e551a700  1 -- 192.168.0.1:0/2332599002 <== osd.3 v2:172.21.15.43:6824/33692 6903 ==== osd_op_reply(59773 10000009f75.00000000 [write 0~5] v71'3897 uv3897 ondisk = 0) v8 ==== 164+0+0 (crc 0 0 0) 0x7fe1d03be040 con 0x7fe1a4005230
2023-04-06T05:32:54.655+0000 7fe1ce7fc700 20 client.4902 put_inode on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) n = 1
2023-04-06T05:32:54.655+0000 7fe1ce7fc700 10 client.4902 _flushed 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50)
2023-04-06T05:32:54.655+0000 7fe1ce7fc700 10 client.4902 put_cap_ref finishing pending cap_snap on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50)
2023-04-06T05:32:54.655+0000 7fe1ce7fc700 10 client.4902 finish_cap_snap 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) capsnap 0x7fe1c45afb78 used -
2023-04-06T05:32:54.655+0000 7fe1ce7fc700 10 client.4902 flush_snaps on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50)
2023-04-06T05:32:54.655+0000 7fe1ce7fc700 10 client.4902 flush_snaps mds.0 follows 1 size 0 mtime 2023-04-06T05:32:54.605381+0000 dirty_data=0 writing=0 on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50)
2023-04-06T05:32:54.655+0000 7fe1ce7fc700  1 -- 192.168.0.1:0/2332599002 --> [v2:172.21.15.98:6836/1818983428,v1:172.21.15.98:6837/1818983428] -- client_caps(flushsnap ino 0x10000009f75 0 seq 0 tid 59047 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 1 size 0/0 mtime 2023-04-06T05:32:54.605381+0000 ctime 2023-04-06T05:32:54.605381+0000 change_attr 3 xattrs(v=1 l=4)) v12 -- 0x7fe1a80060d0 con 0x55ca2fbedcf0
2023-04-06T05:32:54.655+0000 7fe1ce7fc700  5 client.4902 put_cap_ref dropped last FILE_BUFFER ref on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50)
2023-04-06T05:32:54.655+0000 7fe1ce7fc700  5 client.4902 put_cap_ref dropped last FILE_CACHE ref on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50)

In the mds side logs, we can see it just flushed the .snap/1 capsnap, which is flushsnap follows 1 -> snap 2b2:

2023-04-06T05:32:54.656+0000 7ff009dfa700  1 -- [v2:172.21.15.98:6836/1818983428,v1:172.21.15.98:6837/1818983428] <== client.4902 192.168.0.1:0/2332599002 348456 ==== client_caps(flushsnap ino 0x10000009f75 0 seq 0 tid 59047 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 1 size 0/0 mtime 2023-04-06T05:32:54.605381+0000 ctime 2023-04-06T05:32:54.605381+0000 change_attr 3 xattrs(v=1 l=4)) v12 ==== 260+4+0 (crc 0 0 0) 0x55f111fa7500 con 0x55f1112db800
2023-04-06T05:32:54.656+0000 7ff009dfa700  7 mds.0.locker handle_client_caps  on 0x10000009f75 tid 59047 follows 1 op flushsnap flags 0x0
2023-04-06T05:32:54.656+0000 7ff009dfa700 20 mds.0.4 get_session have 0x55f1112a4d00 client.4902 192.168.0.1:0/2332599002 state open
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.locker  head inode [inode 0x10000009f75 [2b4,head] /client.0/tmp/a auth v6 ap=2 DIRTYPARENT truncating(5 to 0) s=0 n(v0 rc2023-04-06T05:32:54.605381+0000 1=1+0) need_snapflush={2b2=4902,2b3=4902} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4902=0-4194304@2b3} caps={4902=pAsxLsXsxFsxcrwb/pAsxXsxFxcwb@6},l=4902 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55f11d6c4c00]
2023-04-06T05:32:54.656+0000 7ff009dfa700 10  mds.0.cache.snaprealm(0x10000009f74 seq 691 0x55f1148b66c0) get_snaps 2b2,2b3 (seq 2b3 cached_seq 2b3)
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.locker   flushsnap follows 1 -> snap 2b2
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache pick_inode_snap follows 2b1 on [inode 0x10000009f75 [2b4,head] /client.0/tmp/a auth v6 ap=2 DIRTYPARENT truncating(5 to 0) s=0 n(v0 rc2023-04-06T05:32:54.605381+0000 1=1+0) need_snapflush={2b2=4902,2b3=4902} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4902=0-4194304@2b3} caps={4902=pAsxLsXsxFsxcrwb/pAsxXsxFxcwb@6},l=4902 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55f11d6c4c00]
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache pick_inode_snap found [inode 0x10000009f75 [2b2,2b3] /client.0/tmp/a auth v2 ap=6 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4902=0-4194304@2b1} | request=0 lock=4 dirty=1 authpin=1 0x55f12046b180]
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.locker  snapped inode [inode 0x10000009f75 [2b2,2b3] /client.0/tmp/a auth v2 ap=6 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4902=0-4194304@2b1} | request=0 lock=4 dirty=1 authpin=1 0x55f12046b180]
2023-04-06T05:32:54.656+0000 7ff009dfa700  7 mds.0.locker  flushsnap snap 2b2 client.4902 on [inode 0x10000009f75 [2b2,2b3] /client.0/tmp/a auth v2 ap=6 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4902=0-4194304@2b1} | request=0 lock=4 dirty=1 authpin=1 0x55f12046b180]
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.locker _do_snap_update dirty Fw follows 1 snap 2b2 on [inode 0x10000009f75 [2b2,2b3] /client.0/tmp/a auth v2 ap=6 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4902=0-4194304@2b1} | request=0 lock=4 dirty=1 authpin=1 0x55f12046b180]
2023-04-06T05:32:54.656+0000 7ff009dfa700 15 mds.0.cache.ino(0x10000009f75) project_inode 0x10000009f75
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache.dir(0x10000009f74) pre_dirty 8
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache.den(0x10000009f74 a) pre_dirty [dentry #0x1/client.0/tmp/a [2b2,2b3] auth (dversion lock) pv=8 v=5 ino=0x10000009f75 state=1610612736 | request=0 inodepin=1 dirty=1 0x55f1154e9400]
...

The map between the snapshot name and snapid:

2023-04-06T05:32:54.676+0000 7ff003dee700 10 mds.0.cache.ino(0x10000009f74)  realm snaprealm(0x10000009f74 seq 2b4 lc 2b4 cr 2b4 cps 2b2 snaps={2b2=snap(2b2 0x10000009f74 '1' 2023-04-06T05:32:54.532244+0000),2b3=snap(2b3 0x10000009f74 '2' 2023-04-06T05:32:54.594649+0000),2b4=snap(2b4 0x10000009f74 '3' 2023-04-06T05:32:54.670214+0000)} last_modified 2023-04-06T05:32:54.670214+0000 change_attr 3 0x55f1148b66c0) parent snaprealm(0x1 seq 1 lc 0 cr 0 cps 1 snaps={} last_modified 2023-04-06T05:07:56.696766+0000 change_attr 0 0x55f1110cad80)

And when cowing the dentry and inodes the file a size is 0:

2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache journal_cow_dentry follows head on [dentry #0x1/client.0/tmp [2b2,head] auth (dversion lock) pv=4292 v=4290 ino=0x10000009f74 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55f1110f0280]
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache journal_cow_dentry follows 2b3 < first on [inode 0x10000009f74 [...2b4,head] /client.0/tmp/ auth v4290 pv4292 ap=1 snaprealm=0x55f1148b66c0 DIRTYPARENT f(v0 m2023-04-06T05:32:54.523706+0000 1=1+0)->f(v0 m2023-04-06T05:32:54.523706+0000 1=1+0) n(v0 rc2023-04-06T05:32:54.605381+0000 rs2 2=1+1)->n(v0 rc2023-04-06T05:32:54.605381+0000 rs2 2=1+1) old_inodes=2 (inest lock w=1) (ifile excl) (iversion lock w=1) caps={4902=pAsLsXsFsx/AsLsXsFsx@7},l=4902 | request=0 lock=2 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55f11c938680]
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache journal_cow_dentry follows 1 on [dentry #0x1/client.0/tmp/a [2b2,2b3] auth (dversion lock) pv=8 v=5 ino=0x10000009f75 state=1610612736 | request=0 inodepin=1 dirty=1 0x55f1154e9400]
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache journal_cow_dentry follows 1 < first on [dentry #0x1/client.0/tmp/a [2b2,2b3] auth (dversion lock) pv=8 v=5 ino=0x10000009f75 state=1610612736 | request=0 inodepin=1 dirty=1 0x55f1154e9400]
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache.ino(0x10000009f75) remove_need_snapflush client.4902 snapid 2b2 on 0x55f12046b180
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache.ino(0x10000009f75) auth_unpin by 0x55f11d6c4c00 on [inode 0x10000009f75 [2b2,2b3] /client.0/tmp/a auth v2 pv8 ap=6 s=0 n(v0 1=1+0)->n(v0 rc2023-04-06T05:32:54.605381+0000 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4902=0-4194304@2b1} | ptrwaiter=1 request=0 lock=4 dirty=1 authpin=1 0x55f12046b180] now 6
2023-04-06T05:32:54.656+0000 7ff009dfa700 15 mds.0.cache.dir(0x10000009f74) adjust_nested_auth_pins -1 on [dir 0x10000009f74 /client.0/tmp/ [2,head] auth pv=9 v=7 cv=0/0 ap=1+8 state=1610874881|complete f(v0 m2023-04-06T05:32:54.523706+0000 1=1+0)->f(v0 m2023-04-06T05:32:54.523706+0000 1=1+0) n(v0 rc2023-04-06T05:32:54.605381+0000 1=1+0)->n(v0 rc2023-04-06T05:32:54.605381+0000 1=1+0) hs=1+0,ss=1+0 dirty=2 | child=1 dirty=1 authpin=1 0x55f1150cda80] by 0x55f11d6c4c00 count now 1/8

And then in client side in finish_cap_snap(), in line 4087 it will set the current size, which is 0 already:

 4084 void Client::finish_cap_snap(Inode *in, CapSnap &capsnap, int used)                                                                                                                                                              
 4085 {
 4086   ldout(cct, 10) << __func__ << " " << *in << " capsnap " << (void *)&capsnap << " used " << ccap_string(used) << dendl;
 4087   capsnap.size = in->size;
 4088   capsnap.mtime = in->mtime;
 4089   capsnap.atime = in->atime;
 4090   capsnap.ctime = in->ctime;
 4091   capsnap.time_warp_seq = in->time_warp_seq;
 4092   capsnap.change_attr = in->change_attr;
 4093   capsnap.dirty |= in->caps_dirty();
 4094 
 4095   /* Only reset it if it wasn't set before */
 4096   if (capsnap.cap_dirtier_uid == -1) {
 4097     capsnap.cap_dirtier_uid = in->cap_dirtier_uid;
 4098     capsnap.cap_dirtier_gid = in->cap_dirtier_gid;
 4099   }
 4100 
 4101   if (capsnap.dirty & CEPH_CAP_FILE_WR) {
 4102     capsnap.inline_data = in->inline_data;
 4103     capsnap.inline_version = in->inline_version;
 4104   }
 4105 
 4106   if (used & CEPH_CAP_FILE_BUFFER) {
 4107     capsnap.writing = 1;
 4108     ldout(cct, 10) << __func__ << " " << *in << " cap_snap " << &capsnap << " used " << used
 4109              << " WRBUFFER, delaying" << dendl;
 4110   } else {
 4111     capsnap.dirty_data = 0;
 4112     flush_snaps(in);
 4113   }
 4114 }

And also in line 4107 it will set the writing to 1, then the Fb and Fw caps will conflict here. That means if the first time only the Fb caps is used then it will create a new capsnap, while for the second time no matter for Fb or Fw it will skip creating a new capsnap in queue_cap_snap():

 4042 void Client::queue_cap_snap(Inode *in, SnapContext& old_snapc)
 4043 {
 4044   int used = get_caps_used(in);
 4045   int dirty = in->caps_dirty();
 4046   ldout(cct, 10) << __func__ << " " << *in << " snapc " << old_snapc << " used " << ccap_string(used) << dendl;
 4047  
 4048   if (in->cap_snaps.size() &&
 4049       in->cap_snaps.rbegin()->second.writing) {
 4050     ldout(cct, 10) << __func__ << " already have pending cap_snap on " << *in << dendl;
 4051     return;
 4052   } else if (in->caps_dirty() ||
 4053             (used & CEPH_CAP_FILE_WR) ||        
 4054              (dirty & CEPH_CAP_ANY_WR)) {        
...

Actually we should add a new capsnap for the second time, because once the data is wrote to the buffer the corresponding capsnap should be froze and for any new change we should create a new one.

That means for .snap/2 will overwrote the capsnap for .snap/1:

2023-04-06T05:32:54.529 INFO:tasks.workunit.client.0.smithi043.stderr:+ echo asdf
2023-04-06T05:32:54.529 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/1
2023-04-06T05:32:54.580 INFO:tasks.workunit.client.0.smithi043.stderr:+ chmod 777 a
2023-04-06T05:32:54.592 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/2
2023-04-06T05:32:54.615 INFO:tasks.workunit.client.0.smithi043.stderr:+ echo qwer
Actions #10

Updated by Xiubo Li 12 months ago

This also needs to be fixed in kclient.

Actions #11

Updated by Venky Shankar 12 months ago

Xiubo Li wrote:

This is a failure with libcephfs and have the client side logs:

vshankar-2023-04-06_04:14:11-fs-wip-vshankar-testing-20230330.105356-testing-default-smithi/7233705/teuthology.log:

[...]

From the client logs we can see that it successfully wrote contents asdf to the file a:

[...]

And then made the snapshot 1, but for the file a it just deferred flushing the snapcap to MDS because of the Fb caps was still hold by the buffer:

Things look fine till here.

The echo qwer will do a truncate open, so it will truncate the file size to 0, please see the file size change druing open:

[...]

AND then finally the buffer was flused and then the dirty snapcap was flushed to MDS, but it was too late and the file size has been set to size=0/4194304:

This is where, I presume, where the bug is. The capsnap should have `capsnap.size = 5` which got updated to `capsnap.size = 0` (or could it have been set to 5 since `qwer` is written?).

[...]

In the mds side logs, we can see it just flushed the .snap/1 capsnap, which is flushsnap follows 1 -> snap 2b2:

[...]

The map between the snapshot name and snapid:

[...]

And when cowing the dentry and inodes the file a size is 0:

[...]

And then in client side in finish_cap_snap(), in line 4087 it will set the current size, which is 0 already:

[...]

And also in line 4107 it will set the writing to 1, then the Fb and Fw caps will conflict here. That means if the first time only the Fb caps is used then it will create a new capsnap, while for the second time no matter for Fb or Fw it will skip creating a new capsnap in queue_cap_snap():

[...]

Actually we should add a new capsnap for the second time, because once the data is wrote to the buffer the corresponding capsnap should be froze and for any new change we should create a new one.

Kind of, yes. Once a capsnap is created, its attributes should not be updated. Its a bit confusing that queue_cap_snap updates a bunch of capsnap attrs (uid, gid, mode, btime,..) and then finish_cap_snap fills in the rest (size, mtime, ctime, etc..), but, finish_cap_snap is called from other code paths that possibly have updated inode attrs.

That means for .snap/2 will overwrote the capsnap for .snap/1:

[...]

Actions #12

Updated by Xiubo Li 12 months ago

Venky Shankar wrote:

Xiubo Li wrote:

This is a failure with libcephfs and have the client side logs:

vshankar-2023-04-06_04:14:11-fs-wip-vshankar-testing-20230330.105356-testing-default-smithi/7233705/teuthology.log:

[...]

From the client logs we can see that it successfully wrote contents asdf to the file a:

[...]

And then made the snapshot 1, but for the file a it just deferred flushing the snapcap to MDS because of the Fb caps was still hold by the buffer:

Things look fine till here.

The echo qwer will do a truncate open, so it will truncate the file size to 0, please see the file size change druing open:

[...]

AND then finally the buffer was flused and then the dirty snapcap was flushed to MDS, but it was too late and the file size has been set to size=0/4194304:

This is where, I presume, where the bug is. The capsnap should have `capsnap.size = 5` which got updated to `capsnap.size = 0` (or could it have been set to 5 since `qwer` is written?).

The first time the capsnap had stored the capnsap.size = 5. But then later just after the file a was open by echo qwer the MDS truncated the size to 0, and then after the buffer was flushed the Fb caps was released too, which would call finish_cap_snap() to update the capsnap again with the size of 0.

[...]

In the mds side logs, we can see it just flushed the .snap/1 capsnap, which is flushsnap follows 1 -> snap 2b2:

[...]

The map between the snapshot name and snapid:

[...]

And when cowing the dentry and inodes the file a size is 0:

[...]

And then in client side in finish_cap_snap(), in line 4087 it will set the current size, which is 0 already:

[...]

And also in line 4107 it will set the writing to 1, then the Fb and Fw caps will conflict here. That means if the first time only the Fb caps is used then it will create a new capsnap, while for the second time no matter for Fb or Fw it will skip creating a new capsnap in queue_cap_snap():

[...]

Actually we should add a new capsnap for the second time, because once the data is wrote to the buffer the corresponding capsnap should be froze and for any new change we should create a new one.

Kind of, yes. Once a capsnap is created, its attributes should not be updated. Its a bit confusing that queue_cap_snap updates a bunch of capsnap attrs (uid, gid, mode, btime,..) and then finish_cap_snap fills in the rest (size, mtime, ctime, etc..), but, finish_cap_snap is called from other code paths that possibly have updated inode attrs.

The finish_cap_snap() here is doing correctly IMO.

Just the Fb case incorrectly reused the writing logic. When the writing is set it means that the client keeps writing to the file during the snapshot is made. And the corresponding capsnap should wait for the writing to finish, after which the Fw caps will be released, and then it will freeze the capsnap immediately. And when freezing the capsnap we should update the corresponding the attrs, such as the size, mtime, etc.

Thanks
- Xiubo

Actions #13

Updated by Venky Shankar 10 months ago

  • Category set to Correctness/Safety
  • Status changed from Fix Under Review to Pending Backport
  • Target version set to v19.0.0
  • Backport set to reef,quincy,pacific
  • Component(FS) Client added
Actions #14

Updated by Backport Bot 10 months ago

  • Copied to Backport #62043: pacific: qa: fs/snaps/snaptest-multiple-capsnaps.sh failed added
Actions #15

Updated by Backport Bot 10 months ago

  • Copied to Backport #62044: reef: qa: fs/snaps/snaptest-multiple-capsnaps.sh failed added
Actions #16

Updated by Backport Bot 10 months ago

  • Copied to Backport #62045: quincy: qa: fs/snaps/snaptest-multiple-capsnaps.sh failed added
Actions #17

Updated by Backport Bot 10 months ago

  • Tags set to backport_processed
Actions #18

Updated by Xiubo Li 9 months ago

  • Status changed from Pending Backport to Resolved
Actions #20

Updated by Venky Shankar 6 months ago

Xiubo - this is seen in reef branch - https://pulpito.ceph.com/vshankar-2023-10-19_01:21:59-fs-wip-vshankar-testing-reef-20231018.163145-testing-default-smithi/7431825/

Its a stock kernel - I assume the kernel fix isn't available in that yet?

Actions #21

Updated by Xiubo Li 6 months ago

Venky Shankar wrote:

Xiubo - this is seen in reef branch - https://pulpito.ceph.com/vshankar-2023-10-19_01:21:59-fs-wip-vshankar-testing-reef-20231018.163145-testing-default-smithi/7431825/

Its a stock kernel - I assume the kernel fix isn't available in that yet?

Yeah, the MR is not merged yet.

Actions #22

Updated by Xiubo Li 5 months ago

Xiubo Li wrote:

Venky Shankar wrote:

Xiubo - this is seen in reef branch - https://pulpito.ceph.com/vshankar-2023-10-19_01:21:59-fs-wip-vshankar-testing-reef-20231018.163145-testing-default-smithi/7431825/

Its a stock kernel - I assume the kernel fix isn't available in that yet?

Yeah, the MR is not merged yet.

The Jira https://issues.redhat.com/browse/RHEL-16412, still not merged.

Actions #23

Updated by Venky Shankar 3 months ago

  • Related to Bug #64209: snaptest-multiple-capsnaps.sh fails with "got remote process result: 1" added
Actions #24

Updated by Venky Shankar about 2 months ago

/a/yuriw-2024-03-16_15:03:17-fs-wip-yuri10-testing-2024-03-15-1653-reef-distro-default-smithi/7606359

Xiubo, still awaiting merged as per https://issues.redhat.com/browse/RHEL-16412, yes?

Actions #25

Updated by Xiubo Li about 2 months ago

Venky Shankar wrote:

/a/yuriw-2024-03-16_15:03:17-fs-wip-yuri10-testing-2024-03-15-1653-reef-distro-default-smithi/7606359

Xiubo, still awaiting merged as per https://issues.redhat.com/browse/RHEL-16412, yes?

The above test was using the stock kernel:

Mar 18 19:57:14 smithi045 kernel: Linux version 4.18.0-513.9.1.el8_9.x86_64 (mockbuild@x86-vm-09.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Nov 16 10:29:04 EST 2023

While JIRA: https://issues.redhat.com/browse/RHEL-16412 has been merged, but the version is kernel-4.18.0-513.22.1.el8_9.

So just because the above test still using the old kernels.

Actions

Also available in: Atom PDF