Tasks #64413
closedTasks #63293: Implement fscrypt in libcephfs and cephfs-fuse
Tasks #64160: RMW race detection
File size is not correct after rmw
0%
Description
When a smaller file is copied to overwrite a larger file, the dest inode size is not changed to source file size. When inspecting config.log, it has trailing 0s.
[fscrypt_test_unlocked_ffsb]$ cp ffsb_stats.c config.log [fscrypt_test_unlocked_ffsb]$ ll total 91 -rw-------. 1 choffman choffman 7547 Feb 12 22:49 config.log -rw-------. 1 choffman choffman 7130 Feb 12 22:49 ffsb_stats.c
config.log file size should be 7130.
Updated by Christopher Hoffman 3 months ago
- Status changed from New to In Progress
Updated by Christopher Hoffman 2 months ago
In the case of O_TRUNC, I've added to update_inode_file_size() to set effective_size to 0, when size is 0.
diff --git a/src/client/Client.cc b/src/client/Client.cc index 9ed12685ba3..53cdbafe3e2 100644 --- a/src/client/Client.cc +++ b/src/client/Client.cc @@ -899,6 +899,9 @@ void Client::update_inode_file_size(Inode *in, int issued, uint64_t size, if (truncate_seq != in->truncate_seq) { ldout(cct, 10) << "truncate_seq " << in->truncate_seq << " -> " << truncate_seq << dendl; + if (in->is_fscrypt_enabled() && size == 0) { + in->set_effective_size(0); + } in->truncate_seq = truncate_seq; in->oset.truncate_seq = truncate_seq;
Then retesting the steps above with the patch there needs to be a listdir added for it to work as expected.
[fscrypt_test_unlocked_ffsb]$ ls config.log ffsb_stats.c [fscrypt_test_unlocked_ffsb]$ cp ffsb_stats.c config.log [fscrypt_test_unlocked_ffsb]$ ls -l total 14 -rw-------. 1 choffman choffman 7130 Feb 21 18:02 config.log -rw-------. 1 choffman choffman 7130 Feb 21 18:01 ffsb_stats.c
If you rerun the above without the ls, fscrypt_file does not get updated with new value (both files should have same value)
[fscrypt_test_unlocked_ffsb]$ getfattr -n ceph.fscrypt.file /mnt/mycephfs/fscrypt_test_unlocked_ffsb/* getfattr: Removing leading '/' from absolute path names # file: mnt/mycephfs/fscrypt_test_unlocked_ffsb/config.log ceph.fscrypt.file=0sex0AAAAAAAA= # file: mnt/mycephfs/fscrypt_test_unlocked_ffsb/ffsb_stats.c ceph.fscrypt.file=0s2hsAAAAAAAA=
The Inode fscrypt_file is being updated with the new effective size through this process (trunc -> 0 and then through _write_success, which expands file size as being written to). This change isn't being committed to mds.
Updated by Venky Shankar 2 months ago
Spoke to Chris regarding this.
Chris, if you can attach the debug client/mds logs from the two runs you mention (one without and the other with fscrypt), I could spend some time to see what's causing the failure.
Updated by Christopher Hoffman about 2 months ago
- Related to Tasks #64723: ffsb configure issues (gcc fails) added
Updated by Venky Shankar about 2 months ago
Spent some time looking through the MDS logs shared in slack channel:
For config.log, the MDS receives a truncate request (open w/ O_TRUNC) as expected:
2024-03-01T15:40:08.451+0000 7fc498e7d6c0 10 mds.0.server do_open_truncate [inode 0x100000001f7 [2,head] /config.log auth v29 ap=3 DIRTYPARENT s=7547 n(v0 rc2024-03-01T15:40:03.418004+0000 b7547 1=1+0) (iauth excl) (ifile xlock x=1 by 0x55df505f8d00) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFcb/pAsxXsxFxwb@3},l=4180 | ptrwaiter=1 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]
And that is recorded in the projected inode:
2024-03-01T15:40:08.452+0000 7fc498e7d6c0 15 mds.0.cache.ino(0x100000001f7) project_inode 0x100000001f7 2024-03-01T15:40:08.452+0000 7fc498e7d6c0 10 mds.0.cache.dir(0x1) pre_dirty 31 2024-03-01T15:40:08.452+0000 7fc498e7d6c0 10 mds.0.cache.den(0x1 config.log) pre_dirty [dentry #0x1/config.log [2,head] auth (dn sync l) (dversion lock) pv=31 v=29 ino=0x100000001f7 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 clientlease=1 0x55df505fd680] 2024-03-01T15:40:08.452+0000 7fc498e7d6c0 10 mds.0.cache.ino(0x100000001f7) pre_dirty 31 (current v 29) 2024-03-01T15:40:08.452+0000 7fc498e7d6c0 10 mds.0.cache predirty_journal_parents linkunlink=0 primary_dn follows head [inode 0x100000001f7 [2,head] /config.log auth v29 pv31 ap=3 DIRTYPARENT truncating(7547 to 0) s=7547 n(v0 rc2024-03-01T15:40:03.418004+0000 b7547 1=1+0)->n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0)/n(v0 rc2024-03-01T15:40:03.418004+0000 b7547 1=1+0) (iauth excl) (ifile xlock x=1 by 0x55df505f8d00) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFcb/pAsxXsxFxwb@3},l=4180 | ptrwaiter=1 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]
After the mdlog submit thread flushes the event, the projected inode is popped and marked dirty:
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 MDSIOContextBase::complete: 25C_MDS_inode_update_finish 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 MDSContext::complete: 25C_MDS_inode_update_finish 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 15 mds.0.cache.ino(0x1) pop_and_dirty_projected_inode v11 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache.ino(0x1) mark_dirty [inode 0x1 [...2,head] / auth v11 ap=2 snaprealm=0x55df4b63d200 DIRTYPARENT f(v0 m2024-03-01T15:40:03.426855+0000 3=2+1) n(v1 rc2024-03-01T15:40:08.439793+0000 b22869 6=4+2)/n(v0 rc2024-03-01T15:39:43.277048+0000 1=0+1) (iauth sync r=1) (isnap sync r=2) (inest lock w=2 dirty) (ipolicy sync r=1) (ifile mix w=1) (iversion lock w=1 last_client=4180) caps={4180=pAsLsXs/-@6} | dirtyscattered=1 request=0 lock=6 dirfrag=1 caps=1 dirtyparent=1 dirty=1 authpin=1 0x55df4b55e100] 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 15 mds.0.cache.ino(0x100000001f7) pop_and_dirty_projected_inode v31 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache.ino(0x100000001f7) mark_dirty [inode 0x100000001f7 [2,head] /config.log auth v31 ap=2 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80] 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache.den(0x1 config.log) mark_dirty [dentry #0x1/config.log [2,head] auth (dn sync l) (dversion lock) pv=31 v=29 ino=0x100000001f7 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 clientlease=1 0x55df505fd680]
The MDS sends a message to the client (client_caps(truncate ino ...)
) and sends an osd_op for truncate:
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 7 mds.0.locker issue_truncate on [inode 0x100000001f7 [2,head] /config.log auth v31 ap=2 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80] 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 20 mds.0.cache.ino(0x100000001f7) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 mtime 2024-03-01T15:40:08.439793+0000 ctime 2024-03-01T15:40:08.439793+0000 change_attr 2 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache.ino(0x100000001f7) 4fscrypt_file aah3r3 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.3 send_message_client_counted client.4180 seq 22 client_caps(trunc ino 0x100000001f7 1 seq 4 caps=pAsxLsXsxFscb dirty=- wanted=pAsxXsxFxwb follows 0 size 0/4194304 ts 2/0 mtime 2024-03-01T15:40:08.439793+0000 ctime 2024-03-01T15:40:08.439793+0000 change_attr 2) v12 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 1 -- [v2:172.21.10.5:6826/143953937,v1:172.21.10.5:6827/143953937] --> 172.21.10.5:0/578263388 -- client_caps(trunc ino 0x100000001f7 1 seq 4 caps=pAsxLsXsxFscb dirty=- wanted=pAsxXsxFxwb follows 0 size 0/4194304 ts 2/0 mtime 2024-03-01T15:40:08.439793+0000 ctime 2024-03-01T15:40:08.439793+0000 change_attr 2) v12 -- 0x55df50635880 con 0x55df5053ff80 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 20 mds.0.locker check_inode_max_size no-op on [inode 0x100000001f7 [2,head] /config.log auth v31 ap=2 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80] 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache truncate_inode 7547 -> 0 on [inode 0x100000001f7 [2,head] /config.log auth v31 ap=2 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80] ... ... ... 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache _truncate_inode snapc 0x55df4b63d3b0 on [inode 0x100000001f7 [2,head] /config.log auth v31 ap=3 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 truncating=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80] fscrypt_last_block length is 0 2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache _truncate_inode truncate on inode [inode 0x100000001f7 [2,head] /config.log auth v31 ap=3 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 truncating=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80] 2024-03-01T15:40:08.460+0000 7fc498e7d6c0 1 -- [v2:172.21.10.5:6826/143953937,v1:172.21.10.5:6827/143953937] --> [v2:172.21.10.5:6810/1028653730,v1:172.21.10.5:6811/1028653730] -- osd_op(unknown.0.3:55 3.0 3:6fe38dc2:::100000001f7.00000000:head [trimtrunc 2@0] snapc 1=[] ondisk+write+known_if_redirected+full_force+supports_pool_eio e28) v8 -- 0x55df50635180 con 0x55df505f7b00
Truncate operation finishes:
2024-03-01T15:40:08.467+0000 7fc498e7d6c0 10 mds.0.cache truncate_inode_finish [inode 0x100000001f7 [2,head] /config.log auth v31 ap=1 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@6},l=4180 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80] 2024-03-01T15:40:08.467+0000 7fc498e7d6c0 15 mds.0.cache.ino(0x100000001f7) project_inode 0x100000001f7 2024-03-01T15:40:08.467+0000 7fc498e7d6c0 10 mds.0.cache.dir(0x1) pre_dirty 33 2024-03-01T15:40:08.467+0000 7fc498e7d6c0 10 mds.0.cache.den(0x1 config.log) pre_dirty [dentry #0x1/config.log [2,head] auth (dn sync l) (dversion lock) pv=33 v=31 ino=0x100000001f7 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 clientlease=1 0x55df505fd680] 2024-03-01T15:40:08.467+0000 7fc498e7d6c0 10 mds.0.cache.ino(0x100000001f7) pre_dirty 33 (current v 31) 2024-03-01T15:40:08.467+0000 7fc498e7d6c0 10 mds.0.cache predirty_journal_parents linkunlink=0 primary_dn follows head [inode 0x100000001f7 [2,head] /config.log auth v31 pv33 ap=1 DIRTYPARENT s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0)->n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@6},l=4180 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]
Chris, so what happens when the (fscrypt) enabled client recevies client_caps(trunc ino ..)
message? The caps sent back by the MDS is pAsxLsXsxFscb
, esp Xsx
- is that somehow tricking the client to believe that the file size (which is stored as an xattr in fscrypt) can be used from its cache?
Updated by Christopher Hoffman about 1 month ago
See patch below for fix:
diff --git a/src/client/Client.cc b/src/client/Client.cc index 105dcdf2607..fda3f193f67 100644 --- a/src/client/Client.cc +++ b/src/client/Client.cc @@ -891,9 +891,16 @@ void Client::update_inode_file_size(Inode *in, int issued, uint64_t size, { uint64_t prior_size = in->size; + // In the case of a pending trunc size that is smaller than orig size + // (i.e. truncating from 8M to 4M) passed truncate_seq will be larger + // than inode truncate_seq. This shows passed size is latest. if (truncate_seq > in->truncate_seq || (truncate_seq == in->truncate_seq && size > in->size)) { ldout(cct, 10) << "size " << in->size << " -> " << size << dendl; + if (in->is_fscrypt_enabled()) { + in->set_effective_size(size); + size = fscrypt_block_from_ofs(size); + } in->size = size; in->reported_size = size; if (truncate_seq != in->truncate_seq) { @@ -1076,9 +1083,6 @@ Inode * Client::add_update_inode(InodeStat *st, utime_t from, if (new_version || (new_issued & (CEPH_CAP_ANY_FILE_RD | CEPH_CAP_ANY_FILE_WR))) { in->layout = st->layout; - if (st->fscrypt_file.size() >= sizeof(uint64_t)) { - in->fscrypt_file = st->fscrypt_file; - } update_inode_file_size(in, issued, st->size, st->truncate_seq, st->truncate_size); }
fscrypt_file/effective_size was being updated in case where local cached size was newer. Above patch only updates size if mds provides a larger truncate_seq. This follow the same logic as real size.
Updated by Christopher Hoffman about 1 month ago
- Status changed from In Progress to Resolved
Updated by Christopher Hoffman about 1 month ago
is that somehow tricking the client to believe that the file size (which is stored as an xattr in fscrypt) can be used from its cache?
The value that we want to use is what is in the client's cache. Correct me if I'm wrong, but Xsx caps are present and behavior is as it should be, wrt caps.
The problem laid with the previous code where the mds value was being used regardless of newer version on client.
The following code was the culprit:
if (st->fscrypt_file.size() >= sizeof(uint64_t)) { in->fscrypt_file = st->fscrypt_file; }