Bug #59343
closedBug #54460: snaptest-multiple-capsnaps.sh test failure
qa: fs/snaps/snaptest-multiple-capsnaps.sh failed
100%
Description
2023-04-05T20:43:29.556 INFO:tasks.workunit.client.0.smithi073.stderr:+ echo asdf 2023-04-05T20:43:29.557 INFO:tasks.workunit.client.0.smithi073.stderr:+ mkdir .snap/1 2023-04-05T20:43:29.569 INFO:tasks.workunit.client.0.smithi073.stderr:+ chmod 777 a 2023-04-05T20:43:29.578 INFO:tasks.workunit.client.0.smithi073.stderr:+ mkdir .snap/2 2023-04-05T20:43:29.592 INFO:tasks.workunit.client.0.smithi073.stderr:+ echo qwer 2023-04-05T20:43:29.604 INFO:tasks.workunit.client.0.smithi073.stderr:+ mkdir .snap/3 2023-04-05T20:43:29.612 INFO:tasks.workunit.client.0.smithi073.stderr:+ chmod 666 a 2023-04-05T20:43:29.614 INFO:tasks.workunit.client.0.smithi073.stderr:+ mkdir .snap/4 2023-04-05T20:43:29.635 INFO:tasks.workunit.client.0.smithi073.stderr:+ echo zxcv 2023-04-05T20:43:29.938 INFO:tasks.workunit.client.0.smithi073.stderr:+ mkdir .snap/5 2023-04-05T20:43:30.118 INFO:tasks.workunit.client.0.smithi073.stderr:+ ls -al .snap/1/a .snap/2/a .snap/3/a .snap/4/a .snap/5/a 2023-04-05T20:43:30.154 INFO:tasks.workunit.client.0.smithi073.stdout:-rw-rw-r--. 1 ubuntu ubuntu 5 Apr 5 20:43 .snap/1/a 2023-04-05T20:43:30.155 INFO:tasks.workunit.client.0.smithi073.stdout:-rwxrwxrwx. 1 ubuntu ubuntu 5 Apr 5 20:43 .snap/2/a 2023-04-05T20:43:30.155 INFO:tasks.workunit.client.0.smithi073.stdout:-rwxrwxrwx. 1 ubuntu ubuntu 5 Apr 5 20:43 .snap/3/a 2023-04-05T20:43:30.155 INFO:tasks.workunit.client.0.smithi073.stdout:-rwxrwxrwx. 1 ubuntu ubuntu 5 Apr 5 20:43 .snap/4/a 2023-04-05T20:43:30.155 INFO:tasks.workunit.client.0.smithi073.stdout:-rw-rw-rw-. 1 ubuntu ubuntu 5 Apr 5 20:43 .snap/5/a 2023-04-05T20:43:30.155 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep asdf .snap/1/a 2023-04-05T20:43:30.158 INFO:tasks.workunit.client.0.smithi073.stdout:asdf 2023-04-05T20:43:30.160 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep 'Size: 5' 2023-04-05T20:43:30.173 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/1/a 2023-04-05T20:43:30.181 INFO:tasks.workunit.client.0.smithi073.stdout: Size: 5 Blocks: 1 IO Block: 4194304 regular file 2023-04-05T20:43:30.182 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep asdf .snap/2/a 2023-04-05T20:43:30.185 INFO:tasks.workunit.client.0.smithi073.stdout:asdf 2023-04-05T20:43:30.185 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/2/a 2023-04-05T20:43:30.185 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep 'Size: 5' 2023-04-05T20:43:30.188 INFO:tasks.workunit.client.0.smithi073.stdout: Size: 5 Blocks: 1 IO Block: 4194304 regular file 2023-04-05T20:43:30.188 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/2/a 2023-04-05T20:43:30.189 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep -- -rwxrwxrwx 2023-04-05T20:43:30.191 INFO:tasks.workunit.client.0.smithi073.stdout:Access: (0777/-rwxrwxrwx) Uid: ( 1000/ ubuntu) Gid: ( 1265/ ubuntu) 2023-04-05T20:43:30.191 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep qwer .snap/3/a 2023-04-05T20:43:30.194 INFO:tasks.workunit.client.0.smithi073.stdout:qwer 2023-04-05T20:43:30.195 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/3/a 2023-04-05T20:43:30.195 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep 'Size: 5' 2023-04-05T20:43:30.198 INFO:tasks.workunit.client.0.smithi073.stdout: Size: 5 Blocks: 1 IO Block: 4194304 regular file 2023-04-05T20:43:30.198 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/3/a 2023-04-05T20:43:30.199 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep -- -rwxrwxrwx 2023-04-05T20:43:30.201 INFO:tasks.workunit.client.0.smithi073.stdout:Access: (0777/-rwxrwxrwx) Uid: ( 1000/ ubuntu) Gid: ( 1265/ ubuntu) 2023-04-05T20:43:30.202 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep qwer .snap/4/a 2023-04-05T20:43:30.207 INFO:tasks.workunit.client.0.smithi073.stdout:qwer 2023-04-05T20:43:30.207 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/4/a 2023-04-05T20:43:30.208 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep 'Size: 5' 2023-04-05T20:43:30.211 INFO:tasks.workunit.client.0.smithi073.stdout: Size: 5 Blocks: 1 IO Block: 4194304 regular file 2023-04-05T20:43:30.214 INFO:tasks.workunit.client.0.smithi073.stderr:+ stat .snap/4/a 2023-04-05T20:43:30.214 INFO:tasks.workunit.client.0.smithi073.stderr:+ grep -- -rw-rw-rw- 2023-04-05T20:43:30.217 DEBUG:teuthology.orchestra.run:got remote process result: 1 2023-04-05T20:43:30.218 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...
The mode for .snap/4/a was mismatch, which should be rw-rw-rw-, but it was rwxrwxrwx
Updated by Xiubo Li about 1 year ago
But this just a little different:
2023-03-31T18:41:27.651 INFO:tasks.workunit.client.0.smithi047.stderr:+ set -e 2023-03-31T18:41:27.653 INFO:tasks.workunit.client.0.smithi047.stderr:+ echo asdf 2023-03-31T18:41:27.654 INFO:tasks.workunit.client.0.smithi047.stderr:+ mkdir .snap/1 2023-03-31T18:41:28.109 INFO:tasks.workunit.client.0.smithi047.stderr:+ chmod 777 a 2023-03-31T18:41:28.125 INFO:tasks.workunit.client.0.smithi047.stderr:+ mkdir .snap/2 2023-03-31T18:41:28.148 INFO:tasks.workunit.client.0.smithi047.stderr:+ echo qwer 2023-03-31T18:41:28.459 INFO:tasks.workunit.client.0.smithi047.stderr:+ mkdir .snap/3 2023-03-31T18:41:28.478 INFO:tasks.workunit.client.0.smithi047.stderr:+ chmod 666 a 2023-03-31T18:41:28.481 INFO:tasks.workunit.client.0.smithi047.stderr:+ mkdir .snap/4 2023-03-31T18:41:28.510 INFO:tasks.workunit.client.0.smithi047.stderr:+ echo zxcv 2023-03-31T18:41:28.728 INFO:tasks.workunit.client.0.smithi047.stderr:+ mkdir .snap/5 2023-03-31T18:41:28.953 INFO:tasks.workunit.client.0.smithi047.stderr:+ ls -al .snap/1/a .snap/2/a .snap/3/a .snap/4/a .snap/5/a 2023-03-31T18:41:29.005 INFO:tasks.workunit.client.0.smithi047.stdout:-rw-rw-r-- 1 ubuntu ubuntu 0 Mar 31 18:41 .snap/1/a 2023-03-31T18:41:29.005 INFO:tasks.workunit.client.0.smithi047.stdout:-rw-rw-r-- 1 ubuntu ubuntu 0 Mar 31 18:41 .snap/2/a 2023-03-31T18:41:29.005 INFO:tasks.workunit.client.0.smithi047.stdout:-rwxrwxrwx 1 ubuntu ubuntu 0 Mar 31 18:41 .snap/3/a 2023-03-31T18:41:29.006 INFO:tasks.workunit.client.0.smithi047.stdout:-rwxrwxrwx 1 ubuntu ubuntu 0 Mar 31 18:41 .snap/4/a 2023-03-31T18:41:29.006 INFO:tasks.workunit.client.0.smithi047.stdout:-rw-rw-rw- 1 ubuntu ubuntu 5 Mar 31 18:41 .snap/5/a 2023-03-31T18:41:29.007 INFO:tasks.workunit.client.0.smithi047.stderr:+ grep asdf .snap/1/a 2023-03-31T18:41:29.011 DEBUG:teuthology.orchestra.run:got remote process result: 1 2023-03-31T18:41:29.012 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...
Which is the same with a very old tracker https://tracker.ceph.com/issues/14557.
Updated by Venky Shankar about 1 year ago
Xiubo - I think this is https://tracker.ceph.com/issues/54460 which is assigned to Milind.
Updated by Xiubo Li about 1 year ago
Venky Shankar wrote:
Xiubo - I think this is https://tracker.ceph.com/issues/54460 which is assigned to Milind.
Yeah, correct.
Updated by Xiubo Li about 1 year ago
- Status changed from New to Duplicate
- Pull request ID set to 54460
Updated by Xiubo Li about 1 year ago
- Parent task set to #54460
- Pull request ID deleted (
54460)
Updated by Xiubo Li 12 months ago
This is a failure with libcephfs and have the client side logs:
vshankar-2023-04-06_04:14:11-fs-wip-vshankar-testing-20230330.105356-testing-default-smithi/7233705/teuthology.log:
2023-04-06T05:32:54.442 DEBUG:teuthology.orchestra.run.smithi043:workunit test fs/snaps/snaptest-multiple-capsnaps.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=bd7128551a85e374cf0114ccfe8dc6ad4b044903 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/snaptest-multiple-capsnaps.sh 2023-04-06T05:32:54.526 INFO:tasks.workunit.client.0.smithi043.stderr:+ set -e 2023-04-06T05:32:54.529 INFO:tasks.workunit.client.0.smithi043.stderr:+ echo asdf 2023-04-06T05:32:54.529 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/1 2023-04-06T05:32:54.580 INFO:tasks.workunit.client.0.smithi043.stderr:+ chmod 777 a 2023-04-06T05:32:54.592 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/2 2023-04-06T05:32:54.615 INFO:tasks.workunit.client.0.smithi043.stderr:+ echo qwer 2023-04-06T05:32:54.664 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/3 2023-04-06T05:32:54.685 INFO:tasks.workunit.client.0.smithi043.stderr:+ chmod 666 a 2023-04-06T05:32:54.688 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/4 2023-04-06T05:32:54.966 INFO:tasks.workunit.client.0.smithi043.stderr:+ echo zxcv 2023-04-06T05:32:54.970 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/5 2023-04-06T05:32:55.179 INFO:tasks.workunit.client.0.smithi043.stderr:+ ls -al .snap/1/a .snap/2/a .snap/3/a .snap/4/a .snap/5/a 2023-04-06T05:32:55.211 INFO:tasks.workunit.client.0.smithi043.stdout:-rw-rw-r-- 1 ubuntu ubuntu 0 Apr 6 05:32 .snap/1/a 2023-04-06T05:32:55.211 INFO:tasks.workunit.client.0.smithi043.stdout:-rw-rw-r-- 1 ubuntu ubuntu 0 Apr 6 05:32 .snap/2/a 2023-04-06T05:32:55.212 INFO:tasks.workunit.client.0.smithi043.stdout:-rwxrwxrwx 1 ubuntu ubuntu 5 Apr 6 05:32 .snap/3/a 2023-04-06T05:32:55.212 INFO:tasks.workunit.client.0.smithi043.stdout:-rwxrwxrwx 1 ubuntu ubuntu 5 Apr 6 05:32 .snap/4/a 2023-04-06T05:32:55.212 INFO:tasks.workunit.client.0.smithi043.stdout:-rw-rw-rw- 1 ubuntu ubuntu 5 Apr 6 05:32 .snap/5/a 2023-04-06T05:32:55.212 INFO:tasks.workunit.client.0.smithi043.stderr:+ grep asdf .snap/1/a 2023-04-06T05:32:55.218 DEBUG:teuthology.orchestra.run:got remote process result: 1 2023-04-06T05:32:55.218 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0... 2023-04-06T05:32:55.218 DEBUG:teuthology.orchestra.run.smithi043:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2023-04-06T05:32:56.435 ERROR:teuthology.run_tasks:Saw exception from tasks.
From the client logs we can see that it successfully wrote contents asdf to the file a:
2023-04-06T05:32:54.523+0000 7fe1c9ff3700 10 client.4902 cap_delay_requeue on 0x10000009f75.head(faked_ino=0 nref=12 ll_ref=1 cap_refs={4=0,1024=1,4096=1,8192=1} open={2=1} mode=100664 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.526619+0000 change_attr=1 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) 2023-04-06T05:32:54.523+0000 7fe1c9ff3700 3 client.4902 ll_write 0x7fe1580d3cd0 0~5 = 5 ...
And then made the snapshot 1, but for the file a it just deferred flushing the snapcap to MDS because of the Fb caps was still hold by the buffer:
2023-04-06T05:32:54.527+0000 7fe1ca7f4700 10 client.4902 send_request client_request(unknown.0:267666 mksnap #0x10000009f74//1 2023-04-06T05:32:54.532244+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 to mds.0 2023-04-06T05:32:54.527+0000 7fe1ca7f4700 1 -- 192.168.0.1:0/2332599002 --> [v2:172.21.15.98:6836/1818983428,v1:172.21.15.98:6837/1818983428] -- client_request(unknown.0:267666 mksnap #0x10000009f74//1 2023-04-06T05:32:54.532244+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 -- 0x7fe1acf780e0 con 0x55ca2fbedcf0 2023-04-06T05:32:54.527+0000 7fe1ca7f4700 20 client.4902 awaiting reply|forward|kick on 0x7fe1ca7f1930 ... 2023-04-06T05:32:54.571+0000 7fe1cd7fa700 1 -- 192.168.0.1:0/2332599002 <== mds.0 v2:172.21.15.98:6836/1818983428 517611 ==== client_snap(split split=0x10000009f74 tracelen=148) v1 ==== 188+0+0 (crc 0 0 0) 0x7fe1d01c2470 con 0x55ca2fbedcf0 2023-04-06T05:32:54.571+0000 7fe1cd7fa700 10 client.4902 handle_snap client_snap(split split=0x10000009f74 tracelen=148) v1 ... 2023-04-06T05:32:54.571+0000 7fe1cd7fa700 10 client.4902 finish_cap_snap 0x10000009f75.head(faked_ino=0 nref=14 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.526619+0000 change_attr=1 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) capsnap 0x7fe1c45afb78 used Fcb 2023-04-06T05:32:54.571+0000 7fe1cd7fa700 10 client.4902 finish_cap_snap 0x10000009f75.head(faked_ino=0 nref=14 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.526619+0000 change_attr=1 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) cap_snap 0x7fe1c45afb78 used 9216 WRBUFFER, delaying ... 2023-04-06T05:32:54.571+0000 7fe1cd7fa700 20 client.4902 put_snap_realm 0x10000009f74 0x7fe1b94f4ef0 3 -> 2 2023-04-06T05:32:54.571+0000 7fe1cd7fa700 1 -- 192.168.0.1:0/2332599002 <== mds.0 v2:172.21.15.98:6836/1818983428 517612 ==== client_reply(???:267666 = 0 (0) Success safe) v1 ==== 357+0+0 (crc 0 0 0) 0x7fe1d0014740 con 0x55ca2fbedcf0
The echo qwer will do a truncate open, so it will truncate the file size to 0, please see the file size change druing open:
2023-04-06T05:32:54.603+0000 7fe1917fa700 10 client.4902 send_request client_request(unknown.0:267671 open #0x10000009f75 2023-04-06T05:32:54.605381+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 to mds.0 2023-04-06T05:32:54.603+0000 7fe1917fa700 1 -- 192.168.0.1:0/2332599002 --> [v2:172.21.15.98:6836/1818983428,v1:172.21.15.98:6837/1818983428] -- client_request(unknown.0:267671 open #0x10000009f75 2023-04-06T05:32:54.605381+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 -- 0x7fe15001e610 con 0x55ca2fbedcf0 2023-04-06T05:32:54.603+0000 7fe1917fa700 20 client.4902 awaiting reply|forward|kick on 0x7fe1917f7920 2023-04-06T05:32:54.603+0000 7fe1cd7fa700 1 -- 192.168.0.1:0/2332599002 <== mds.0 v2:172.21.15.98:6836/1818983428 517619 ==== client_caps(revoke ino 0x10000009f75 1 seq 3 caps=pAsxLsXsxFcb dirty=- wanted=pAsxXsxFxwb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2023-04-06T05:32:54.523706+0000 ctime 2023-04-06T05:32:54.523706+0000 change_attr 0) v12 ==== 260+0+0 (crc 0 0 0) 0x7fe1d0334c40 con 0x55ca2fbedcf0 2023-04-06T05:32:54.603+0000 7fe1cd7fa700 10 client.4902 mds.0 seq now 169108 2023-04-06T05:32:54.603+0000 7fe1cd7fa700 5 client.4902 handle_cap_grant on in 0x10000009f75 mds.0 seq 3 caps now pAsxLsXsxFcb was pAsxLsXsxFsxcrwb 2023-04-06T05:32:54.603+0000 7fe1cd7fa700 10 client.4902 update_inode_file_time 0x10000009f75.head(faked_ino=0 nref=19 ll_ref=5 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.589193+0000 change_attr=2 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=AxFw objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) pAsxLsXsxFsxcrwb ctime 2023-04-06T05:32:54.523706+0000 mtime 2023-04-06T05:32:54.523706+0000 ... 2023-04-06T05:32:54.607+0000 7fe1cd7fa700 20 client.4902 put_inode on 0x10000009f75.head(faked_ino=0 nref=19 ll_ref=5 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.589193+0000 change_attr=2 caps=pAsxLsXsxFcb(0=pAsxLsXsxFcb) objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) n = 1 2023-04-06T05:32:54.607+0000 7fe1cd7fa700 1 -- 192.168.0.1:0/2332599002 <== mds.0 v2:172.21.15.98:6836/1818983428 517621 ==== client_reply(???:267671 = 0 (0) Success) v1 ==== 513+0+0 (crc 0 0 0) 0x7fe1d01c2470 con 0x55ca2fbedcf0 2023-04-06T05:32:54.607+0000 7fe1cd7fa700 20 client.4902 handle_client_reply got a reply. Safe:0 tid 267671 ... 2023-04-06T05:32:54.607+0000 7fe1cd7fa700 10 client.4902 update_inode_file_time 0x10000009f75.head(faked_ino=0 nref=19 ll_ref=5 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=5/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.526619+0000 ctime=2023-04-06T05:32:54.589193+0000 change_attr=2 caps=pAsxLsXsxFcb(0=pAsxLsXsxFcb) objectset[0x10000009f75 ts 0/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) pAsxLsXsxFcb ctime 2023-04-06T05:32:54.605381+0000 mtime 2023-04-06T05:32:54.605381+0000 2023-04-06T05:32:54.607+0000 7fe1cd7fa700 10 client.4902 size 5 -> 0 2023-04-06T05:32:54.607+0000 7fe1cd7fa700 10 client.4902 truncate_seq 1 -> 2 2023-04-06T05:32:54.607+0000 7fe1cd7fa700 10 client.4902 _invalidate_inode_cache 0x10000009f75.head(faked_ino=0 nref=19 ll_ref=5 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=2 caps=pAsxLsXsxFcb(0=pAsxLsXsxFcb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 5] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) 0~5
AND then finally the buffer was flused and then the dirty snapcap was flushed to MDS, but it was too late and the file size has been set to size=0/4194304:
2023-04-06T05:32:54.611+0000 7fe192ffd700 10 client.4902 nothing to flush 2023-04-06T05:32:54.611+0000 7fe192ffd700 20 client.4902 put_inode on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) n = 1 2023-04-06T05:32:54.611+0000 7fe192ffd700 10 client.4902 waiting for WRBUFFER to get dropped 2023-04-06T05:32:54.655+0000 7fe1e551a700 1 -- 192.168.0.1:0/2332599002 <== osd.3 v2:172.21.15.43:6824/33692 6903 ==== osd_op_reply(59773 10000009f75.00000000 [write 0~5] v71'3897 uv3897 ondisk = 0) v8 ==== 164+0+0 (crc 0 0 0) 0x7fe1d03be040 con 0x7fe1a4005230 2023-04-06T05:32:54.655+0000 7fe1ce7fc700 20 client.4902 put_inode on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) n = 1 2023-04-06T05:32:54.655+0000 7fe1ce7fc700 10 client.4902 _flushed 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) 2023-04-06T05:32:54.655+0000 7fe1ce7fc700 10 client.4902 put_cap_ref finishing pending cap_snap on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) 2023-04-06T05:32:54.655+0000 7fe1ce7fc700 10 client.4902 finish_cap_snap 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) capsnap 0x7fe1c45afb78 used - 2023-04-06T05:32:54.655+0000 7fe1ce7fc700 10 client.4902 flush_snaps on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) 2023-04-06T05:32:54.655+0000 7fe1ce7fc700 10 client.4902 flush_snaps mds.0 follows 1 size 0 mtime 2023-04-06T05:32:54.605381+0000 dirty_data=0 writing=0 on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) 2023-04-06T05:32:54.655+0000 7fe1ce7fc700 1 -- 192.168.0.1:0/2332599002 --> [v2:172.21.15.98:6836/1818983428,v1:172.21.15.98:6837/1818983428] -- client_caps(flushsnap ino 0x10000009f75 0 seq 0 tid 59047 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 1 size 0/0 mtime 2023-04-06T05:32:54.605381+0000 ctime 2023-04-06T05:32:54.605381+0000 change_attr 3 xattrs(v=1 l=4)) v12 -- 0x7fe1a80060d0 con 0x55ca2fbedcf0 2023-04-06T05:32:54.655+0000 7fe1ce7fc700 5 client.4902 put_cap_ref dropped last FILE_BUFFER ref on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50) 2023-04-06T05:32:54.655+0000 7fe1ce7fc700 5 client.4902 put_cap_ref dropped last FILE_CACHE ref on 0x10000009f75.head(faked_ino=0 nref=24 ll_ref=4 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=1} mode=100777 size=0/4194304 nlink=1 btime=2023-04-06T05:32:54.523706+0000 mtime=2023-04-06T05:32:54.605381+0000 ctime=2023-04-06T05:32:54.605381+0000 change_attr=3 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000009f75 ts 2/0 objects 1 dirty_or_tx 0] parents=0x10000009f74.head["a"] 0x7fe1c400bd50)
In the mds side logs, we can see it just flushed the .snap/1 capsnap, which is flushsnap follows 1 -> snap 2b2:
2023-04-06T05:32:54.656+0000 7ff009dfa700 1 -- [v2:172.21.15.98:6836/1818983428,v1:172.21.15.98:6837/1818983428] <== client.4902 192.168.0.1:0/2332599002 348456 ==== client_caps(flushsnap ino 0x10000009f75 0 seq 0 tid 59047 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 1 size 0/0 mtime 2023-04-06T05:32:54.605381+0000 ctime 2023-04-06T05:32:54.605381+0000 change_attr 3 xattrs(v=1 l=4)) v12 ==== 260+4+0 (crc 0 0 0) 0x55f111fa7500 con 0x55f1112db800 2023-04-06T05:32:54.656+0000 7ff009dfa700 7 mds.0.locker handle_client_caps on 0x10000009f75 tid 59047 follows 1 op flushsnap flags 0x0 2023-04-06T05:32:54.656+0000 7ff009dfa700 20 mds.0.4 get_session have 0x55f1112a4d00 client.4902 192.168.0.1:0/2332599002 state open 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.locker head inode [inode 0x10000009f75 [2b4,head] /client.0/tmp/a auth v6 ap=2 DIRTYPARENT truncating(5 to 0) s=0 n(v0 rc2023-04-06T05:32:54.605381+0000 1=1+0) need_snapflush={2b2=4902,2b3=4902} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4902=0-4194304@2b3} caps={4902=pAsxLsXsxFsxcrwb/pAsxXsxFxcwb@6},l=4902 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55f11d6c4c00] 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache.snaprealm(0x10000009f74 seq 691 0x55f1148b66c0) get_snaps 2b2,2b3 (seq 2b3 cached_seq 2b3) 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.locker flushsnap follows 1 -> snap 2b2 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache pick_inode_snap follows 2b1 on [inode 0x10000009f75 [2b4,head] /client.0/tmp/a auth v6 ap=2 DIRTYPARENT truncating(5 to 0) s=0 n(v0 rc2023-04-06T05:32:54.605381+0000 1=1+0) need_snapflush={2b2=4902,2b3=4902} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4902=0-4194304@2b3} caps={4902=pAsxLsXsxFsxcrwb/pAsxXsxFxcwb@6},l=4902 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55f11d6c4c00] 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache pick_inode_snap found [inode 0x10000009f75 [2b2,2b3] /client.0/tmp/a auth v2 ap=6 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4902=0-4194304@2b1} | request=0 lock=4 dirty=1 authpin=1 0x55f12046b180] 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.locker snapped inode [inode 0x10000009f75 [2b2,2b3] /client.0/tmp/a auth v2 ap=6 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4902=0-4194304@2b1} | request=0 lock=4 dirty=1 authpin=1 0x55f12046b180] 2023-04-06T05:32:54.656+0000 7ff009dfa700 7 mds.0.locker flushsnap snap 2b2 client.4902 on [inode 0x10000009f75 [2b2,2b3] /client.0/tmp/a auth v2 ap=6 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4902=0-4194304@2b1} | request=0 lock=4 dirty=1 authpin=1 0x55f12046b180] 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.locker _do_snap_update dirty Fw follows 1 snap 2b2 on [inode 0x10000009f75 [2b2,2b3] /client.0/tmp/a auth v2 ap=6 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4902=0-4194304@2b1} | request=0 lock=4 dirty=1 authpin=1 0x55f12046b180] 2023-04-06T05:32:54.656+0000 7ff009dfa700 15 mds.0.cache.ino(0x10000009f75) project_inode 0x10000009f75 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache.dir(0x10000009f74) pre_dirty 8 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache.den(0x10000009f74 a) pre_dirty [dentry #0x1/client.0/tmp/a [2b2,2b3] auth (dversion lock) pv=8 v=5 ino=0x10000009f75 state=1610612736 | request=0 inodepin=1 dirty=1 0x55f1154e9400] ...
The map between the snapshot name and snapid:
2023-04-06T05:32:54.676+0000 7ff003dee700 10 mds.0.cache.ino(0x10000009f74) realm snaprealm(0x10000009f74 seq 2b4 lc 2b4 cr 2b4 cps 2b2 snaps={2b2=snap(2b2 0x10000009f74 '1' 2023-04-06T05:32:54.532244+0000),2b3=snap(2b3 0x10000009f74 '2' 2023-04-06T05:32:54.594649+0000),2b4=snap(2b4 0x10000009f74 '3' 2023-04-06T05:32:54.670214+0000)} last_modified 2023-04-06T05:32:54.670214+0000 change_attr 3 0x55f1148b66c0) parent snaprealm(0x1 seq 1 lc 0 cr 0 cps 1 snaps={} last_modified 2023-04-06T05:07:56.696766+0000 change_attr 0 0x55f1110cad80)
And when cowing the dentry and inodes the file a size is 0:
2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache journal_cow_dentry follows head on [dentry #0x1/client.0/tmp [2b2,head] auth (dversion lock) pv=4292 v=4290 ino=0x10000009f74 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55f1110f0280] 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache journal_cow_dentry follows 2b3 < first on [inode 0x10000009f74 [...2b4,head] /client.0/tmp/ auth v4290 pv4292 ap=1 snaprealm=0x55f1148b66c0 DIRTYPARENT f(v0 m2023-04-06T05:32:54.523706+0000 1=1+0)->f(v0 m2023-04-06T05:32:54.523706+0000 1=1+0) n(v0 rc2023-04-06T05:32:54.605381+0000 rs2 2=1+1)->n(v0 rc2023-04-06T05:32:54.605381+0000 rs2 2=1+1) old_inodes=2 (inest lock w=1) (ifile excl) (iversion lock w=1) caps={4902=pAsLsXsFsx/AsLsXsFsx@7},l=4902 | request=0 lock=2 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55f11c938680] 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache journal_cow_dentry follows 1 on [dentry #0x1/client.0/tmp/a [2b2,2b3] auth (dversion lock) pv=8 v=5 ino=0x10000009f75 state=1610612736 | request=0 inodepin=1 dirty=1 0x55f1154e9400] 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache journal_cow_dentry follows 1 < first on [dentry #0x1/client.0/tmp/a [2b2,2b3] auth (dversion lock) pv=8 v=5 ino=0x10000009f75 state=1610612736 | request=0 inodepin=1 dirty=1 0x55f1154e9400] 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache.ino(0x10000009f75) remove_need_snapflush client.4902 snapid 2b2 on 0x55f12046b180 2023-04-06T05:32:54.656+0000 7ff009dfa700 10 mds.0.cache.ino(0x10000009f75) auth_unpin by 0x55f11d6c4c00 on [inode 0x10000009f75 [2b2,2b3] /client.0/tmp/a auth v2 pv8 ap=6 s=0 n(v0 1=1+0)->n(v0 rc2023-04-06T05:32:54.605381+0000 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4902=0-4194304@2b1} | ptrwaiter=1 request=0 lock=4 dirty=1 authpin=1 0x55f12046b180] now 6 2023-04-06T05:32:54.656+0000 7ff009dfa700 15 mds.0.cache.dir(0x10000009f74) adjust_nested_auth_pins -1 on [dir 0x10000009f74 /client.0/tmp/ [2,head] auth pv=9 v=7 cv=0/0 ap=1+8 state=1610874881|complete f(v0 m2023-04-06T05:32:54.523706+0000 1=1+0)->f(v0 m2023-04-06T05:32:54.523706+0000 1=1+0) n(v0 rc2023-04-06T05:32:54.605381+0000 1=1+0)->n(v0 rc2023-04-06T05:32:54.605381+0000 1=1+0) hs=1+0,ss=1+0 dirty=2 | child=1 dirty=1 authpin=1 0x55f1150cda80] by 0x55f11d6c4c00 count now 1/8
And then in client side in finish_cap_snap(), in line 4087 it will set the current size, which is 0 already:
4084 void Client::finish_cap_snap(Inode *in, CapSnap &capsnap, int used) 4085 { 4086 ldout(cct, 10) << __func__ << " " << *in << " capsnap " << (void *)&capsnap << " used " << ccap_string(used) << dendl; 4087 capsnap.size = in->size; 4088 capsnap.mtime = in->mtime; 4089 capsnap.atime = in->atime; 4090 capsnap.ctime = in->ctime; 4091 capsnap.time_warp_seq = in->time_warp_seq; 4092 capsnap.change_attr = in->change_attr; 4093 capsnap.dirty |= in->caps_dirty(); 4094 4095 /* Only reset it if it wasn't set before */ 4096 if (capsnap.cap_dirtier_uid == -1) { 4097 capsnap.cap_dirtier_uid = in->cap_dirtier_uid; 4098 capsnap.cap_dirtier_gid = in->cap_dirtier_gid; 4099 } 4100 4101 if (capsnap.dirty & CEPH_CAP_FILE_WR) { 4102 capsnap.inline_data = in->inline_data; 4103 capsnap.inline_version = in->inline_version; 4104 } 4105 4106 if (used & CEPH_CAP_FILE_BUFFER) { 4107 capsnap.writing = 1; 4108 ldout(cct, 10) << __func__ << " " << *in << " cap_snap " << &capsnap << " used " << used 4109 << " WRBUFFER, delaying" << dendl; 4110 } else { 4111 capsnap.dirty_data = 0; 4112 flush_snaps(in); 4113 } 4114 }
And also in line 4107 it will set the writing to 1, then the Fb and Fw caps will conflict here. That means if the first time only the Fb caps is used then it will create a new capsnap, while for the second time no matter for Fb or Fw it will skip creating a new capsnap in queue_cap_snap():
4042 void Client::queue_cap_snap(Inode *in, SnapContext& old_snapc) 4043 { 4044 int used = get_caps_used(in); 4045 int dirty = in->caps_dirty(); 4046 ldout(cct, 10) << __func__ << " " << *in << " snapc " << old_snapc << " used " << ccap_string(used) << dendl; 4047 4048 if (in->cap_snaps.size() && 4049 in->cap_snaps.rbegin()->second.writing) { 4050 ldout(cct, 10) << __func__ << " already have pending cap_snap on " << *in << dendl; 4051 return; 4052 } else if (in->caps_dirty() || 4053 (used & CEPH_CAP_FILE_WR) || 4054 (dirty & CEPH_CAP_ANY_WR)) { ...
Actually we should add a new capsnap for the second time, because once the data is wrote to the buffer the corresponding capsnap should be froze and for any new change we should create a new one.
That means for .snap/2 will overwrote the capsnap for .snap/1:
2023-04-06T05:32:54.529 INFO:tasks.workunit.client.0.smithi043.stderr:+ echo asdf 2023-04-06T05:32:54.529 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/1 2023-04-06T05:32:54.580 INFO:tasks.workunit.client.0.smithi043.stderr:+ chmod 777 a 2023-04-06T05:32:54.592 INFO:tasks.workunit.client.0.smithi043.stderr:+ mkdir .snap/2 2023-04-06T05:32:54.615 INFO:tasks.workunit.client.0.smithi043.stderr:+ echo qwer
Updated by Venky Shankar 12 months ago
Xiubo Li wrote:
This is a failure with libcephfs and have the client side logs:
vshankar-2023-04-06_04:14:11-fs-wip-vshankar-testing-20230330.105356-testing-default-smithi/7233705/teuthology.log:
[...]
From the client logs we can see that it successfully wrote contents asdf to the file a:
[...]
And then made the snapshot 1, but for the file a it just deferred flushing the snapcap to MDS because of the Fb caps was still hold by the buffer:
Things look fine till here.
The echo qwer will do a truncate open, so it will truncate the file size to 0, please see the file size change druing open:
[...]
AND then finally the buffer was flused and then the dirty snapcap was flushed to MDS, but it was too late and the file size has been set to size=0/4194304:
This is where, I presume, where the bug is. The capsnap should have `capsnap.size = 5` which got updated to `capsnap.size = 0` (or could it have been set to 5 since `qwer` is written?).
[...]
In the mds side logs, we can see it just flushed the .snap/1 capsnap, which is flushsnap follows 1 -> snap 2b2:
[...]
The map between the snapshot name and snapid:
[...]
And when cowing the dentry and inodes the file a size is 0:
[...]
And then in client side in finish_cap_snap(), in line 4087 it will set the current size, which is 0 already:
[...]
And also in line 4107 it will set the writing to 1, then the Fb and Fw caps will conflict here. That means if the first time only the Fb caps is used then it will create a new capsnap, while for the second time no matter for Fb or Fw it will skip creating a new capsnap in queue_cap_snap():
[...]
Actually we should add a new capsnap for the second time, because once the data is wrote to the buffer the corresponding capsnap should be froze and for any new change we should create a new one.
Kind of, yes. Once a capsnap is created, its attributes should not be updated. Its a bit confusing that queue_cap_snap updates a bunch of capsnap attrs (uid, gid, mode, btime,..) and then finish_cap_snap fills in the rest (size, mtime, ctime, etc..), but, finish_cap_snap is called from other code paths that possibly have updated inode attrs.
That means for .snap/2 will overwrote the capsnap for .snap/1:
[...]
Updated by Xiubo Li 12 months ago
Venky Shankar wrote:
Xiubo Li wrote:
This is a failure with libcephfs and have the client side logs:
vshankar-2023-04-06_04:14:11-fs-wip-vshankar-testing-20230330.105356-testing-default-smithi/7233705/teuthology.log:
[...]
From the client logs we can see that it successfully wrote contents asdf to the file a:
[...]
And then made the snapshot 1, but for the file a it just deferred flushing the snapcap to MDS because of the Fb caps was still hold by the buffer:
Things look fine till here.
The echo qwer will do a truncate open, so it will truncate the file size to 0, please see the file size change druing open:
[...]
AND then finally the buffer was flused and then the dirty snapcap was flushed to MDS, but it was too late and the file size has been set to size=0/4194304:
This is where, I presume, where the bug is. The capsnap should have `capsnap.size = 5` which got updated to `capsnap.size = 0` (or could it have been set to 5 since `qwer` is written?).
The first time the capsnap had stored the capnsap.size = 5. But then later just after the file a was open by echo qwer the MDS truncated the size to 0, and then after the buffer was flushed the Fb caps was released too, which would call finish_cap_snap() to update the capsnap again with the size of 0.
[...]
In the mds side logs, we can see it just flushed the .snap/1 capsnap, which is flushsnap follows 1 -> snap 2b2:
[...]
The map between the snapshot name and snapid:
[...]
And when cowing the dentry and inodes the file a size is 0:
[...]
And then in client side in finish_cap_snap(), in line 4087 it will set the current size, which is 0 already:
[...]
And also in line 4107 it will set the writing to 1, then the Fb and Fw caps will conflict here. That means if the first time only the Fb caps is used then it will create a new capsnap, while for the second time no matter for Fb or Fw it will skip creating a new capsnap in queue_cap_snap():
[...]
Actually we should add a new capsnap for the second time, because once the data is wrote to the buffer the corresponding capsnap should be froze and for any new change we should create a new one.
Kind of, yes. Once a capsnap is created, its attributes should not be updated. Its a bit confusing that queue_cap_snap updates a bunch of capsnap attrs (uid, gid, mode, btime,..) and then finish_cap_snap fills in the rest (size, mtime, ctime, etc..), but, finish_cap_snap is called from other code paths that possibly have updated inode attrs.
The finish_cap_snap() here is doing correctly IMO.
Just the Fb case incorrectly reused the writing logic. When the writing is set it means that the client keeps writing to the file during the snapshot is made. And the corresponding capsnap should wait for the writing to finish, after which the Fw caps will be released, and then it will freeze the capsnap immediately. And when freezing the capsnap we should update the corresponding the attrs, such as the size, mtime, etc.
Thanks
- Xiubo
Updated by Venky Shankar 10 months ago
- Category set to Correctness/Safety
- Status changed from Fix Under Review to Pending Backport
- Target version set to v19.0.0
- Backport set to reef,quincy,pacific
- Component(FS) Client added
Updated by Backport Bot 10 months ago
- Copied to Backport #62043: pacific: qa: fs/snaps/snaptest-multiple-capsnaps.sh failed added
Updated by Backport Bot 10 months ago
- Copied to Backport #62044: reef: qa: fs/snaps/snaptest-multiple-capsnaps.sh failed added
Updated by Backport Bot 10 months ago
- Copied to Backport #62045: quincy: qa: fs/snaps/snaptest-multiple-capsnaps.sh failed added
Updated by Xiubo Li 8 months ago
The kernel fixing patchwork link: https://patchwork.kernel.org/project/ceph-devel/patch/20230511100911.361132-1-xiubli@redhat.com/
Updated by Venky Shankar 6 months ago
Xiubo - this is seen in reef branch - https://pulpito.ceph.com/vshankar-2023-10-19_01:21:59-fs-wip-vshankar-testing-reef-20231018.163145-testing-default-smithi/7431825/
Its a stock kernel - I assume the kernel fix isn't available in that yet?
Updated by Xiubo Li 6 months ago
Venky Shankar wrote:
Xiubo - this is seen in reef branch - https://pulpito.ceph.com/vshankar-2023-10-19_01:21:59-fs-wip-vshankar-testing-reef-20231018.163145-testing-default-smithi/7431825/
Its a stock kernel - I assume the kernel fix isn't available in that yet?
Yeah, the MR is not merged yet.
Updated by Xiubo Li 5 months ago
Xiubo Li wrote:
Venky Shankar wrote:
Xiubo - this is seen in reef branch - https://pulpito.ceph.com/vshankar-2023-10-19_01:21:59-fs-wip-vshankar-testing-reef-20231018.163145-testing-default-smithi/7431825/
Its a stock kernel - I assume the kernel fix isn't available in that yet?
Yeah, the MR is not merged yet.
The Jira https://issues.redhat.com/browse/RHEL-16412, still not merged.
Updated by Venky Shankar 3 months ago
- Related to Bug #64209: snaptest-multiple-capsnaps.sh fails with "got remote process result: 1" added
Updated by Venky Shankar about 2 months ago
/a/yuriw-2024-03-16_15:03:17-fs-wip-yuri10-testing-2024-03-15-1653-reef-distro-default-smithi/7606359
Xiubo, still awaiting merged as per https://issues.redhat.com/browse/RHEL-16412, yes?
Updated by Xiubo Li about 2 months ago
Venky Shankar wrote:
/a/yuriw-2024-03-16_15:03:17-fs-wip-yuri10-testing-2024-03-15-1653-reef-distro-default-smithi/7606359
Xiubo, still awaiting merged as per https://issues.redhat.com/browse/RHEL-16412, yes?
The above test was using the stock kernel:
Mar 18 19:57:14 smithi045 kernel: Linux version 4.18.0-513.9.1.el8_9.x86_64 (mockbuild@x86-vm-09.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Nov 16 10:29:04 EST 2023
While JIRA: https://issues.redhat.com/browse/RHEL-16412 has been merged, but the version is kernel-4.18.0-513.22.1.el8_9.
So just because the above test still using the old kernels.