Tasks #64819
closedTasks #63293: Implement fscrypt in libcephfs and cephfs-fuse
Tasks #64160: RMW race detection
data corruption during rmw after lseek
0%
Description
There's data corruption during a rmw after a seek.
reproducer
import os def write_fill(fd, fill, size, offset): s = '' for i in range(0,size): s += fill os.lseek(fd, offset - int(size / 2), 0) os.write(fd, str.encode(s)) file = 'file.log' fd = os.open(file, os.O_RDWR|os.O_CREAT) s = write_fill(fd, 's', 32, 16) s = write_fill(fd, 't', 8, 16) os.close(fd)
Run the above on fscrypt enabled dir and non-fscrypt dir.
fscrypt enabled dir
hexdump /mnt/mycephfs/fscrypt_test_unlocked_ffsb/file.log 0000000 7373 7373 7373 7373 7373 7373 7474 7474 0000010 7474 7474 0000 0000 0000 0000 0000 0000 0000020
non-fscrypt dir
hexdump file.log 0000000 7373 7373 7373 7373 7373 7373 7474 7474 0000010 7474 7474 7373 7373 7373 7373 7373 7373 0000020
fscrypt case, file.log does not have trailing ssss... as it should
Updated by Christopher Hoffman 24 days ago · Edited
The RC for this issue is fixed by:
diff --git a/src/client/Client.cc b/src/client/Client.cc index 9ccb39501a4..80c84a920d8 100644 --- a/src/client/Client.cc +++ b/src/client/Client.cc @@ -11824,12 +11824,11 @@ int Client::WriteEncMgr::read_modify_write(Context *_iofinish) ldout(cct, 10) << "start_block=" << start_block << " start_block_ofs=" << start_block_ofs << " ofs_in_start_block=" << ofs_in_start_block << " end_block=" << end_block << " end_block_ofs=" << end_block_ofs << " ofs_in_end_block=" << ofs_in_end_block - << dendl; + << " offset=" << offset + << " effective_size=" << in->effective_size() << " endoff=" << endoff << dendl; need_read_start = ofs_in_start_block > 0; need_read_end = (endoff < in->effective_size() && ofs_in_end_block < FSCRYPT_BLOCK_SIZE - 1 && start_block != end_block); - read_start_size = (need_read_start && need_read_end && start_block == end_block ? - FSCRYPT_BLOCK_SIZE : ofs_in_start_block); - + read_start_size = FSCRYPT_BLOCK_SIZE; bool need_read = need_read_start | need_read_end; ldout(cct, 10) << "need_read_start=" << need_read_start << "need_read_end=" << need_read_end << " read_start_size=" << read_start_size << " need_read=" << need_read << dendl;
read_start_size was always ofs_in_start_block as it depends on need_read_end and have conflicting bool statements about start/end block. This caused data after the write portion to not be read. read_async would read up to the offset of where the write was occuring, write the pending data and leave the rest of fscrypt block empty. In fscrypt, the whole block needs to be read, overwrite starting at offset to len then recommit the whole fscrypt block.
Updated by Christopher Hoffman 24 days ago · Edited
There's also another case that is failing rmw:
reproducer:
import os file = 'datafile31' fd = os.open(file, os.O_RDONLY) os.close(fd) fd = os.open(file, os.O_RDWR|os.O_CREAT) #reproducing sys calls after ffsb bench has started fill = '\0' size = 3192 offset = 60653568 s = '' for i in range(0,size): s += fill os.lseek(fd, offset, 0) os.write(fd, str.encode(s)) os.close(fd)
The error we're getting
Inode.cc: In function 'int Inode::put_cap_ref(int)' thread 7f8aabfff6c0 time 2024-04-10T19:30:45.442916+0000 Inode.cc: 212: FAILED ceph_assert(cap_refs[c] > 0)
There's probably a missing buffered read that isn't happening that should.
Updated by Christopher Hoffman 22 days ago
As seen in note 2, the case of modification happening at start of block and contained within a single block is now addressed by:
diff --git a/src/client/Client.cc b/src/client/Client.cc index 9ccb39501a4..3b45ed8453f 100644 --- a/src/client/Client.cc +++ b/src/client/Client.cc @@ -11825,11 +11825,10 @@ int Client::WriteEncMgr::read_modify_write(Context *_iofinish) << " ofs_in_start_block=" << ofs_in_start_block << " end_block=" << end_block << " end_block_ofs=" << end_block_ofs << " ofs_in_end_block=" << ofs_in_end_block << dendl; - need_read_start = ofs_in_start_block > 0; + need_read_start = (ofs_in_start_block >= 0 && (endoff - offset) <= FSCRYPT_BLOCK_SIZE);
Updated by Christopher Hoffman 22 days ago
- Status changed from In Progress to Resolved
The reproducers above were simplifications of failures/errors from running the ffsb test suite on a fscrypt enabled directory. Now with the above changes, ffsb test suite is now completing and passing.