Project

General

Profile

Actions

Tasks #64413

closed

Tasks #63293: Implement fscrypt in libcephfs and cephfs-fuse

Tasks #64160: RMW race detection

File size is not correct after rmw

Added by Christopher Hoffman 3 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

When a smaller file is copied to overwrite a larger file, the dest inode size is not changed to source file size. When inspecting config.log, it has trailing 0s.

[fscrypt_test_unlocked_ffsb]$ cp ffsb_stats.c config.log 
[fscrypt_test_unlocked_ffsb]$ ll
total 91
-rw-------. 1 choffman choffman  7547 Feb 12 22:49 config.log
-rw-------. 1 choffman choffman  7130 Feb 12 22:49 ffsb_stats.c

config.log file size should be 7130.


Related issues 1 (0 open1 closed)

Related to CephFS - Tasks #64723: ffsb configure issues (gcc fails)ClosedChristopher Hoffman

Actions
Actions #1

Updated by Christopher Hoffman 3 months ago

  • Status changed from New to In Progress
Actions #2

Updated by Christopher Hoffman 3 months ago

  • Description updated (diff)
Actions #3

Updated by Patrick Donnelly 3 months ago

  • Assignee set to Christopher Hoffman
Actions #4

Updated by Christopher Hoffman 2 months ago

In the case of O_TRUNC, I've added to update_inode_file_size() to set effective_size to 0, when size is 0.

diff --git a/src/client/Client.cc b/src/client/Client.cc
index 9ed12685ba3..53cdbafe3e2 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -899,6 +899,9 @@ void Client::update_inode_file_size(Inode *in, int issued, uint64_t size,
     if (truncate_seq != in->truncate_seq) {
       ldout(cct, 10) << "truncate_seq " << in->truncate_seq << " -> " 
               << truncate_seq << dendl;
+    if (in->is_fscrypt_enabled() && size == 0) {
+        in->set_effective_size(0);
+    }
       in->truncate_seq = truncate_seq;
       in->oset.truncate_seq = truncate_seq;

Then retesting the steps above with the patch there needs to be a listdir added for it to work as expected.

[fscrypt_test_unlocked_ffsb]$ ls 
config.log  ffsb_stats.c
[fscrypt_test_unlocked_ffsb]$ cp ffsb_stats.c config.log 
[fscrypt_test_unlocked_ffsb]$ ls -l
total 14
-rw-------. 1 choffman choffman 7130 Feb 21 18:02 config.log
-rw-------. 1 choffman choffman 7130 Feb 21 18:01 ffsb_stats.c

If you rerun the above without the ls, fscrypt_file does not get updated with new value (both files should have same value)

[fscrypt_test_unlocked_ffsb]$ getfattr -n ceph.fscrypt.file /mnt/mycephfs/fscrypt_test_unlocked_ffsb/*
getfattr: Removing leading '/' from absolute path names
# file: mnt/mycephfs/fscrypt_test_unlocked_ffsb/config.log
ceph.fscrypt.file=0sex0AAAAAAAA=

# file: mnt/mycephfs/fscrypt_test_unlocked_ffsb/ffsb_stats.c
ceph.fscrypt.file=0s2hsAAAAAAAA=

The Inode fscrypt_file is being updated with the new effective size through this process (trunc -> 0 and then through _write_success, which expands file size as being written to). This change isn't being committed to mds.

Actions #5

Updated by Venky Shankar 2 months ago

Spoke to Chris regarding this.

Chris, if you can attach the debug client/mds logs from the two runs you mention (one without and the other with fscrypt), I could spend some time to see what's causing the failure.

Actions #6

Updated by Christopher Hoffman about 2 months ago

  • Related to Tasks #64723: ffsb configure issues (gcc fails) added
Actions #7

Updated by Venky Shankar about 2 months ago

Spent some time looking through the MDS logs shared in slack channel:

For config.log, the MDS receives a truncate request (open w/ O_TRUNC) as expected:

2024-03-01T15:40:08.451+0000 7fc498e7d6c0 10 mds.0.server do_open_truncate [inode 0x100000001f7 [2,head] /config.log auth v29 ap=3 DIRTYPARENT s=7547 n(v0 rc2024-03-01T15:40:03.418004+0000 b7547 1=1+0) (iauth excl) (ifile xlock x=1 by 0x55df505f8d00) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFcb/pAsxXsxFxwb@3},l=4180 | ptrwaiter=1 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]

And that is recorded in the projected inode:

2024-03-01T15:40:08.452+0000 7fc498e7d6c0 15 mds.0.cache.ino(0x100000001f7) project_inode 0x100000001f7
2024-03-01T15:40:08.452+0000 7fc498e7d6c0 10 mds.0.cache.dir(0x1) pre_dirty 31
2024-03-01T15:40:08.452+0000 7fc498e7d6c0 10 mds.0.cache.den(0x1 config.log) pre_dirty [dentry #0x1/config.log [2,head] auth (dn sync l) (dversion lock) pv=31 v=29 ino=0x100000001f7 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 clientlease=1 0x55df505fd680]
2024-03-01T15:40:08.452+0000 7fc498e7d6c0 10 mds.0.cache.ino(0x100000001f7) pre_dirty 31 (current v 29)
2024-03-01T15:40:08.452+0000 7fc498e7d6c0 10 mds.0.cache predirty_journal_parents linkunlink=0 primary_dn follows head [inode 0x100000001f7 [2,head] /config.log auth v29 pv31 ap=3 DIRTYPARENT truncating(7547 to 0) s=7547 n(v0 rc2024-03-01T15:40:03.418004+0000 b7547 1=1+0)->n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0)/n(v0 rc2024-03-01T15:40:03.418004+0000 b7547 1=1+0) (iauth excl) (ifile xlock x=1 by 0x55df505f8d00) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFcb/pAsxXsxFxwb@3},l=4180 | ptrwaiter=1 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]

After the mdlog submit thread flushes the event, the projected inode is popped and marked dirty:

2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 MDSIOContextBase::complete: 25C_MDS_inode_update_finish
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 MDSContext::complete: 25C_MDS_inode_update_finish
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 15 mds.0.cache.ino(0x1) pop_and_dirty_projected_inode v11
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache.ino(0x1) mark_dirty [inode 0x1 [...2,head] / auth v11 ap=2 snaprealm=0x55df4b63d200 DIRTYPARENT f(v0 m2024-03-01T15:40:03.426855+0000 3=2+1) n(v1 rc2024-03-01T15:40:08.439793+0000 b22869 6=4+2)/n(v0 rc2024-03-01T15:39:43.277048+0000 1=0+1) (iauth sync r=1) (isnap sync r=2) (inest lock w=2 dirty) (ipolicy sync r=1) (ifile mix w=1) (iversion lock w=1 last_client=4180) caps={4180=pAsLsXs/-@6} | dirtyscattered=1 request=0 lock=6 dirfrag=1 caps=1 dirtyparent=1 dirty=1 authpin=1 0x55df4b55e100]
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 15 mds.0.cache.ino(0x100000001f7) pop_and_dirty_projected_inode v31
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache.ino(0x100000001f7) mark_dirty [inode 0x100000001f7 [2,head] /config.log auth v31 ap=2 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache.den(0x1 config.log) mark_dirty [dentry #0x1/config.log [2,head] auth (dn sync l) (dversion lock) pv=31 v=29 ino=0x100000001f7 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 clientlease=1 0x55df505fd680]

The MDS sends a message to the client (client_caps(truncate ino ...)) and sends an osd_op for truncate:

2024-03-01T15:40:08.459+0000 7fc498e7d6c0  7 mds.0.locker issue_truncate on [inode 0x100000001f7 [2,head] /config.log auth v31 ap=2 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 20 mds.0.cache.ino(0x100000001f7) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 mtime 2024-03-01T15:40:08.439793+0000 ctime 2024-03-01T15:40:08.439793+0000 change_attr 2
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache.ino(0x100000001f7) 4fscrypt_file aah3r3
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.3 send_message_client_counted client.4180 seq 22 client_caps(trunc ino 0x100000001f7 1 seq 4 caps=pAsxLsXsxFscb dirty=- wanted=pAsxXsxFxwb follows 0 size 0/4194304 ts 2/0 mtime 2024-03-01T15:40:08.439793+0000 ctime 2024-03-01T15:40:08.439793+0000 change_attr 2) v12
2024-03-01T15:40:08.459+0000 7fc498e7d6c0  1 -- [v2:172.21.10.5:6826/143953937,v1:172.21.10.5:6827/143953937] --> 172.21.10.5:0/578263388 -- client_caps(trunc ino 0x100000001f7 1 seq 4 caps=pAsxLsXsxFscb dirty=- wanted=pAsxXsxFxwb follows 0 size 0/4194304 ts 2/0 mtime 2024-03-01T15:40:08.439793+0000 ctime 2024-03-01T15:40:08.439793+0000 change_attr 2) v12 -- 0x55df50635880 con 0x55df5053ff80
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 20 mds.0.locker check_inode_max_size no-op on [inode 0x100000001f7 [2,head] /config.log auth v31 ap=2 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache truncate_inode 7547 -> 0 on [inode 0x100000001f7 [2,head] /config.log auth v31 ap=2 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]
...
...
...
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache _truncate_inode  snapc 0x55df4b63d3b0 on [inode 0x100000001f7 [2,head] /config.log auth v31 ap=3 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 truncating=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80] fscrypt_last_block length is 0
2024-03-01T15:40:08.459+0000 7fc498e7d6c0 10 mds.0.cache _truncate_inode truncate on inode [inode 0x100000001f7 [2,head] /config.log auth v31 ap=3 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=4180) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFscb/pAsxXsxFxwb@4},l=4180 | ptrwaiter=0 request=1 lock=3 caps=1 truncating=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]
2024-03-01T15:40:08.460+0000 7fc498e7d6c0  1 -- [v2:172.21.10.5:6826/143953937,v1:172.21.10.5:6827/143953937] --> [v2:172.21.10.5:6810/1028653730,v1:172.21.10.5:6811/1028653730] -- osd_op(unknown.0.3:55 3.0 3:6fe38dc2:::100000001f7.00000000:head [trimtrunc 2@0] snapc 1=[] ondisk+write+known_if_redirected+full_force+supports_pool_eio e28) v8 -- 0x55df50635180 con 0x55df505f7b00

Truncate operation finishes:

2024-03-01T15:40:08.467+0000 7fc498e7d6c0 10 mds.0.cache truncate_inode_finish [inode 0x100000001f7 [2,head] /config.log auth v31 ap=1 DIRTYPARENT truncating(7547 to 0) s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@6},l=4180 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]
2024-03-01T15:40:08.467+0000 7fc498e7d6c0 15 mds.0.cache.ino(0x100000001f7) project_inode 0x100000001f7
2024-03-01T15:40:08.467+0000 7fc498e7d6c0 10 mds.0.cache.dir(0x1) pre_dirty 33
2024-03-01T15:40:08.467+0000 7fc498e7d6c0 10 mds.0.cache.den(0x1 config.log) pre_dirty [dentry #0x1/config.log [2,head] auth (dn sync l) (dversion lock) pv=33 v=31 ino=0x100000001f7 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 clientlease=1 0x55df505fd680]
2024-03-01T15:40:08.467+0000 7fc498e7d6c0 10 mds.0.cache.ino(0x100000001f7) pre_dirty 33 (current v 31)
2024-03-01T15:40:08.467+0000 7fc498e7d6c0 10 mds.0.cache predirty_journal_parents linkunlink=0 primary_dn follows head [inode 0x100000001f7 [2,head] /config.log auth v31 pv33 ap=1 DIRTYPARENT s=0 n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0)->n(v0 rc2024-03-01T15:40:08.439793+0000 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4180=0-4194304@1} caps={4180=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@6},l=4180 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55df505cbb80]

Chris, so what happens when the (fscrypt) enabled client recevies client_caps(trunc ino ..) message? The caps sent back by the MDS is pAsxLsXsxFscb, esp Xsx - is that somehow tricking the client to believe that the file size (which is stored as an xattr in fscrypt) can be used from its cache?

Actions #8

Updated by Christopher Hoffman about 1 month ago

See patch below for fix:

diff --git a/src/client/Client.cc b/src/client/Client.cc
index 105dcdf2607..fda3f193f67 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -891,9 +891,16 @@ void Client::update_inode_file_size(Inode *in, int issued, uint64_t size,
 {
   uint64_t prior_size = in->size;

+  // In the case of a pending trunc size that is smaller than orig size
+  // (i.e. truncating from 8M to 4M) passed truncate_seq will be larger
+  // than inode truncate_seq. This shows passed size is latest.
   if (truncate_seq > in->truncate_seq ||
       (truncate_seq == in->truncate_seq && size > in->size)) {
     ldout(cct, 10) << "size " << in->size << " -> " << size << dendl;
+    if (in->is_fscrypt_enabled()) {
+      in->set_effective_size(size);
+      size = fscrypt_block_from_ofs(size);
+    }
     in->size = size;
     in->reported_size = size;
     if (truncate_seq != in->truncate_seq) {
@@ -1076,9 +1083,6 @@ Inode * Client::add_update_inode(InodeStat *st, utime_t from,
   if (new_version ||
       (new_issued & (CEPH_CAP_ANY_FILE_RD | CEPH_CAP_ANY_FILE_WR))) {
     in->layout = st->layout;
-    if (st->fscrypt_file.size() >= sizeof(uint64_t)) {
-      in->fscrypt_file = st->fscrypt_file;
-    }
     update_inode_file_size(in, issued, st->size, st->truncate_seq, st->truncate_size);
   }

fscrypt_file/effective_size was being updated in case where local cached size was newer. Above patch only updates size if mds provides a larger truncate_seq. This follow the same logic as real size.

Actions #9

Updated by Christopher Hoffman about 1 month ago

  • Status changed from In Progress to Resolved
Actions #10

Updated by Christopher Hoffman about 1 month ago

is that somehow tricking the client to believe that the file size (which is stored as an xattr in fscrypt) can be used from its cache?

The value that we want to use is what is in the client's cache. Correct me if I'm wrong, but Xsx caps are present and behavior is as it should be, wrt caps.

The problem laid with the previous code where the mds value was being used regardless of newer version on client.

The following code was the culprit:

    if (st->fscrypt_file.size() >= sizeof(uint64_t)) {
      in->fscrypt_file = st->fscrypt_file;
    }

Actions

Also available in: Atom PDF