Bug #51948
closedCephfs EL8.4 kernel client append mode corruption
0%
Description
A file corruption issue is occurring when using the cephfs kernel client on EL8.4 kernels (4.18.0-305.3.1 - 4.18.0-305.10.2) although may have existed prior to EL8.4 . The corruption occurs on files open for appending . Files will sometimes become corrupted with 0's overwriting portions of the file's contents . The overwritten portions always seem to occur starting at a 4Kbyte boundary . To reproduce the bug I run:
$ for y in {1..1000} ; do echo "$(date -Isec) [testing] Running test number $y" >> test.log ; sleep 1 ; done
This will often yield a corrupted file, although not always . For example a test I ran yielded the following results:
$ file test.log
test.log: data
$ hexdump -C test.log | head -8
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000fd0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 32 30 |..............20|
00000fe0 32 31 2d 30 37 2d 31 39 54 31 33 3a 33 37 3a 34 |21-07-19T13:37:4|
00000ff0 34 2d 30 36 3a 30 30 20 5b 74 65 73 74 69 6e 67 |4-06:00 [testing|
00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001d50 00 00 00 00 00 00 00 00 32 30 32 31 2d 30 37 2d |........2021-07-|
As you can see in this instance corruption occurred at offset 0 and at offset 0x1000 . The issue does not occur with the ceph fuse client or with kernel 5.4.135, but does affect EL8.4 kernels at a minimum . It may be coincidence, but the corruption seems like it occurs more often when another client is also reading the file . I have had this issue when my cluster was running octopus or pacific with 1 or 2 active mds . I have attached a sample corrupted file .
Files
Updated by Ilya Dryomov over 2 years ago
- Category set to fs/ceph
- Assignee set to Jeff Layton
- Priority changed from Normal to High
Updated by Jeff Layton over 2 years ago
- Status changed from New to Resolved
This is a known bug now fixed in mainline. 8.4.z should get this patch too, but I'm not yet clear on the ETA. See:
https://bugzilla.redhat.com/show_bug.cgi?id=1971101
Since this doesn't concern upstream kernels, I'm going to close this as Resolved. Hopefully RHEL 8.4.z should get the fix soon.