Actions
Bug #58834
openkclient: xfstests-dev generic/075 failed
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
generic/075 [failed, exit status 1]- output mismatch (see /tmp/tmp.PzLBGAOIcXxfstests-dev/results//generic/075.out.bad) --- tests/generic/075.out 2023-02-21 02:51:57.079244980 +0000 +++ /tmp/tmp.PzLBGAOIcXxfstests-dev/results//generic/075.out.bad 2023-02-21 03:05:46.103612075 +0000 @@ -4,15 +4,5 @@ ----------------------------------------------- fsx.0 : -d -N numops -S 0 ----------------------------------------------- - ------------------------------------------------ -fsx.1 : -d -N numops -S 0 -x ------------------------------------------------ ... (Run 'diff -u /tmp/tmp.PzLBGAOIcXxfstests-dev/tests/generic/075.out /tmp/tmp.PzLBGAOIcXxfstests-dev/results//generic/075.out.bad' to see the entire diff)
Only saw this once.
Updated by Xiubo Li about 1 year ago
generic/112 [failed, exit status 1]- output mismatch (see /tmp/tmp.QUxmoUObcrxfstests-dev/results//generic/112.out.bad) --- tests/generic/112.out 2023-02-23 03:49:18.580164190 +0000 +++ /tmp/tmp.QUxmoUObcrxfstests-dev/results//generic/112.out.bad 2023-02-23 04:13:59.529829597 +0000 @@ -4,15 +4,4 @@ ----------------------------------------------- fsx.0 : -A -d -N numops -S 0 ----------------------------------------------- - ------------------------------------------------ -fsx.1 : -A -d -N numops -S 0 -x ------------------------------------------------ ... (Run 'diff -u /tmp/tmp.QUxmoUObcrxfstests-dev/tests/generic/112.out /tmp/tmp.QUxmoUObcrxfstests-dev/results//generic/112.out.bad' to see the entire diff) generic/113 131s
Updated by Xiubo Li about 1 year ago
A more detailed debug logs:
2023-03-01T01:37:45.435 INFO:tasks.cephfs.test_fscrypt:Command stdout - FSTYP -- ceph PLATFORM -- Linux/x86_64 smithi047 6.2.0-ceph-g4d47d81cd6a6 #1 SMP PREEMPT_DYNAMIC Tue Feb 28 13:46:13 UTC 2023 MKFS_OPTIONS -- 172.21.15.12:6789:/scratch MOUNT_OPTIONS -- -o name=admin,secret=AQCLqv5ju9isIxAAW9waXCZCxCle15GJmhMWdA==,test_dummy_encryption -o context=system_u:object_r:root_t:s0 172.21.15.12:6789:/scratch /tmp/tmp.bg0HVqUGdoscratch generic/075 [failed, exit status 1]- output mismatch (see /tmp/tmp.a8dwrLqOIvxfstests-dev/results//generic/075.out.bad) --- tests/generic/075.out 2023-03-01 01:31:44.916121724 +0000 +++ /tmp/tmp.a8dwrLqOIvxfstests-dev/results//generic/075.out.bad 2023-03-01 01:37:22.632170100 +0000 @@ -4,15 +4,5 @@ ----------------------------------------------- fsx.0 : -d -N numops -S 0 ----------------------------------------------- - ------------------------------------------------ -fsx.1 : -d -N numops -S 0 -x ------------------------------------------------ - ------------------------------------------------ -fsx.2 : -d -N numops -l filelen -S 0 ------------------------------------------------ - ------------------------------------------------ -fsx.3 : -d -N numops -l filelen -S 0 -x ------------------------------------------------ + fsx (-d -N 1000 -S 0) failed, 0 - compare /tmp/tmp.a8dwrLqOIvxfstests-dev/results//generic/075.0.{good,bad,fsxlog} +mv: '/tmp/tmp.a8dwrLqOIvxfstests-dev/results//generic/075.0.fsxlog' and '/tmp/tmp.a8dwrLqOIvxfstests-dev/results//generic/075.0.fsxlog' are the same file generic/112 23s Ran: generic/075 generic/112 Failures: generic/075 Failed 1 of 2 tests
Updated by Xiubo Li about 1 year ago
Reproduced it locally:
[root@lxbceph1 xfstests-dev]# ./check generic/075 FSTYP -- ceph PLATFORM -- Linux/x86_64 lxbceph1 6.2.0-00462-g015be187259d-dirty #215 SMP PREEMPT_DYNAMIC Tue Feb 28 21:50:43 CST 2023 MKFS_OPTIONS -- 10.72.47.117:40262:/testB MOUNT_OPTIONS -- -o test_dummy_encryption,name=admin,nowsync,copyfrom,rasize=4096 -o context=system_u:object_r:root_t:s0 10.72.47.117:40262:/testB /mnt/kcephfs.B generic/075 46s ... [failed, exit status 1]- output mismatch (see /data/xfstests-dev/results//generic/075.out.bad) --- tests/generic/075.out 2022-04-07 21:42:09.618007240 +0800 +++ /data/xfstests-dev/results//generic/075.out.bad 2023-03-01 14:49:25.641633437 +0800 @@ -4,15 +4,127 @@ ----------------------------------------------- fsx.0 : -d -N numops -S 0 ----------------------------------------------- - ------------------------------------------------ -fsx.1 : -d -N numops -S 0 -x ------------------------------------------------ - ------------------------------------------------ -fsx.2 : -d -N numops -l filelen -S 0 ------------------------------------------------ - ------------------------------------------------ -fsx.3 : -d -N numops -l filelen -S 0 -x ------------------------------------------------ +Seed set to 277337 +lxb-------------main:2877 +fsx: main:2909 +: No such file or directory +fsx: main:2963 +: No such file or directory +fsx: main:2977 +: No such file or directory +main: filesystem does not support fallocate mode 0, disabling! +main: filesystem does not support fallocate mode FALLOC_FL_KEEP_SIZE, disabling! +main: filesystem does not support fallocate mode FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, disabling! +main: filesystem does not support fallocate mode FALLOC_FL_ZERO_RANGE, disabling! +main: filesystem does not support fallocate mode FALLOC_FL_COLLAPSE_RANGE, disabling! +main: filesystem does not support fallocate mode FALLOC_FL_INSERT_RANGE, disabling! +main: filesystem does not support clone range, disabling! +main: filesystem does not support dedupe range, disabling! +truncating to largest ever: 0x18697 +4 trunc from 0x0 to 0x18697 +5 read 0x7ec5 thru 0x11a9f (0x9bdb bytes) +7 mapread 0x70b9 thru 0x88ec (0x1834 bytes) +10 mapwrite 0x77df thru 0xb16c (0x398e bytes) +11 trunc from 0x18697 to 0x4cbc +13 mapwrite 0x3b24 thru 0x8abc (0x4f99 bytes) +15 read 0x1ae3 thru 0x8abc (0x6fda bytes) +16 trunc from 0x8abd to 0xb032 +copying to largest ever: 0x2d08e +17 copy from 0xaa32 to 0xb032, (0x600 bytes) at 0x2ca8e +21 read 0x2649f thru 0x2d08d (0x6bef bytes) +24 mapread 0x21e6a thru 0x2d08d (0xb224 bytes) +26 mapread 0x11c76 thru 0x18093 (0x641e bytes) +27 mapwrite 0x537 thru 0x99c4 (0x948e bytes) +30 mapwrite 0x2c090 thru 0x35b42 (0x9ab3 bytes) +31 copy from 0x6e7a to 0xf76e, (0x88f4 bytes) at 0x24093 +copying to largest ever: 0x32c05 +32 copy from 0x1f995 to 0x26af6, (0x7161 bytes) at 0x2baa4 +35 read 0x1b53a thru 0x1b629 (0xf0 bytes) +37 trunc from 0x35b43 to 0x44b4 +38 copy from 0x116f to 0x44b4, (0x3345 bytes) at 0x20547 +41 mapread 0x1f3be thru 0x2388b (0x44ce bytes) +43 copy from 0x304c to 0xff96, (0xcf4a bytes) at 0x1af3b +44 read 0x3cbc thru 0xd016 (0x935b bytes) +READ BAD DATA: offset = 0x3cbc, size = 0x935b, fname = 075.0 +OFFSET GOOD BAD RANGE +0x044b4 0x0000 0x1bea 0x00000 +operation# (mod 256) for the bad data may be 27 +0x044b5 0x0000 0xea1b 0x00001 +operation# (mod 256) for the bad data may be 27 +0x044b6 0x0000 0x1b99 0x00002 +operation# (mod 256) for the bad data may be 27 +0x044b7 0x0000 0x991b 0x00003 +operation# (mod 256) for the bad data may be 27 +0x044b8 0x0000 0x1b67 0x00004 +operation# (mod 256) for the bad data may be 27 +0x044b9 0x0000 0x671b 0x00005 +operation# (mod 256) for the bad data may be 27 +0x044ba 0x0000 0x1b18 0x00006 +operation# (mod 256) for the bad data may be 27 +0x044bb 0x0000 0x181b 0x00007 +operation# (mod 256) for the bad data may be 27 +0x044bc 0x0000 0x1b92 0x00008 +operation# (mod 256) for the bad data may be 27 +0x044bd 0x0000 0x921b 0x00009 +operation# (mod 256) for the bad data may be 27 +0x044be 0x0000 0x1bf4 0x0000a +operation# (mod 256) for the bad data may be 27 +0x044bf 0x0000 0xf41b 0x0000b +operation# (mod 256) for the bad data may be 27 +0x044c0 0x0000 0x1b05 0x0000c +operation# (mod 256) for the bad data may be 27 +0x044c1 0x0000 0x051b 0x0000d +operation# (mod 256) for the bad data may be 27 +0x044c2 0x0000 0x1b55 0x0000e +operation# (mod 256) for the bad data may be 27 +0x044c3 0x0000 0x551b 0x0000f +operation# (mod 256) for the bad data may be 27 +LOG DUMP (44 total operations): +1( 1 mod 256): SKIPPED (no operation) +2( 2 mod 256): SKIPPED (no operation) +3( 3 mod 256): SKIPPED (no operation) +4( 4 mod 256): TRUNCATE UP from 0x0 to 0x18697 ******WWWW +5( 5 mod 256): READ 0x7ec5 thru 0x11a9f (0x9bdb bytes) +6( 6 mod 256): SKIPPED (no operation) +7( 7 mod 256): MAPREAD 0x70b9 thru 0x88ec (0x1834 bytes) +8( 8 mod 256): SKIPPED (no operation) +9( 9 mod 256): SKIPPED (no operation) +10( 10 mod 256): MAPWRITE 0x77df thru 0xb16c (0x398e bytes) +11( 11 mod 256): TRUNCATE DOWN from 0x18697 to 0x4cbc +12( 12 mod 256): SKIPPED (no operation) +13( 13 mod 256): MAPWRITE 0x3b24 thru 0x8abc (0x4f99 bytes) ******WWWW +14( 14 mod 256): SKIPPED (no operation) +15( 15 mod 256): READ 0x1ae3 thru 0x8abc (0x6fda bytes) ***RRRR*** +16( 16 mod 256): TRUNCATE UP from 0x8abd to 0xb032 +17( 17 mod 256): COPY 0xaa32 thru 0xb031 (0x600 bytes) to 0x2ca8e thru 0x2d08d +18( 18 mod 256): SKIPPED (no operation) +19( 19 mod 256): SKIPPED (no operation) +20( 20 mod 256): SKIPPED (no operation) +21( 21 mod 256): READ 0x2649f thru 0x2d08d (0x6bef bytes) +22( 22 mod 256): SKIPPED (no operation) +23( 23 mod 256): SKIPPED (no operation) +24( 24 mod 256): MAPREAD 0x21e6a thru 0x2d08d (0xb224 bytes) +25( 25 mod 256): SKIPPED (no operation) +26( 26 mod 256): MAPREAD 0x11c76 thru 0x18093 (0x641e bytes) +27( 27 mod 256): MAPWRITE 0x537 thru 0x99c4 (0x948e bytes) ******WWWW +28( 28 mod 256): SKIPPED (no operation) +29( 29 mod 256): SKIPPED (no operation) +30( 30 mod 256): MAPWRITE 0x2c090 thru 0x35b42 (0x9ab3 bytes) +31( 31 mod 256): COPY 0x6e7a thru 0xf76d (0x88f4 bytes) to 0x24093 thru 0x2c986 +32( 32 mod 256): COPY 0x1f995 thru 0x26af5 (0x7161 bytes) to 0x2baa4 thru 0x32c04 +33( 33 mod 256): SKIPPED (no operation) +34( 34 mod 256): SKIPPED (no operation) +35( 35 mod 256): READ 0x1b53a thru 0x1b629 (0xf0 bytes) +36( 36 mod 256): SKIPPED (no operation) +37( 37 mod 256): TRUNCATE DOWN from 0x35b43 to 0x44b4 ******WWWW +38( 38 mod 256): COPY 0x116f thru 0x44b3 (0x3345 bytes) to 0x20547 thru 0x2388b +39( 39 mod 256): SKIPPED (no operation) +40( 40 mod 256): SKIPPED (no operation) +41( 41 mod 256): MAPREAD 0x1f3be thru 0x2388b (0x44ce bytes) +42( 42 mod 256): SKIPPED (no operation) +43( 43 mod 256): COPY 0x304c thru 0xff95 (0xcf4a bytes) to 0x1af3b thru 0x27e84 EEEE****** +44( 44 mod 256): READ 0x3cbc thru 0xd016 (0x935b bytes) ***RRRR*** +Log of operations saved to "/data/xfstests-dev/results//generic/075.0.fsxops"; replay with --replay-ops +Correct content saved for comparison +(maybe hexdump "075.0" vs "/data/xfstests-dev/results//generic/075.0.fsxgood") + fsx (-d -N 1000 -S 0) failed, 110 - compare /data/xfstests-dev/results//generic/075.0.{good,bad,fsxlog} Ran: generic/075 Failures: generic/075 Failed 1 of 1 tests
Updated by Xiubo Li about 1 year ago
It's the TRUNCATE DOWN make the data corrupted:
+37( 37 mod 256): TRUNCATE DOWN from 0x35b43 to 0x44b4 ******WWWW
1013 004450 e31b b51b 0c1b 4f1b ec1b b21b 3b1b 7c1b |1013 004450 e31b b51b 0c1b 4f1b ec1b b21b 3b1b 7c1b 1014 004460 ea1b 861b f11b b01b a01b ac1b d61b 651b |1014 004460 ea1b 861b f11b b01b a01b ac1b d61b 651b 1015 004470 381b 541b 671b 3e1b b31b 2b1b f41b d81b |1015 004470 381b 541b 671b 3e1b b31b 2b1b f41b d81b 1016 004480 7b1b a61b 0e1b 3f1b 241b 711b ee1b 9d1b |1016 004480 7b1b a61b 0e1b 3f1b 241b 711b ee1b 9d1b 1017 004490 be1b 9d1b 121b 591b d81b 911b 771b 621b |1017 004490 be1b 9d1b 121b 591b d81b 911b 771b 621b 1018 0044a0 971b cc1b 2c1b 841b 531b d71b aa1b 461b |1018 0044a0 971b cc1b 2c1b 841b 531b d71b aa1b 461b 1019 0044b0 971b 301b ea1b 991b 671b 181b 921b f41b |1019 0044b0 971b 301b 0000 0000 0000 0000 0000 0000 1020 0044c0 051b 551b fc1b 301b 961b 801b be1b 3d1b |1020 0044c0 0000 0000 0000 0000 0000 0000 0000 0000 1021 0044d0 a21b eb1b 751b 241b 761b 861b 5e1b 0f1b |1021 * 1022 0044e0 4d1b 251b 721b b31b f51b 6e1b 481b 921b |1022 01af30 0000 0000 0000 0000 0000 1b00 1b5c 1b61 1023 0044f0 1b1b ce1b 821b 4a1b 7e1b d91b 321b ca1b |1023 01af40 1bb0 1bc7 1b67 1b46 1bee 1bc3 1b86 1ba3 /data/075.1/075.0.bad, , unix ASCII=57,HEX=0x39 1023,34 63% /data/075.1/075.0.good, , unix ASCII=48,HEX=0x30 1023,1 75%
Updated by Xiubo Li about 1 year ago
It's racy between two size truncate operations when the new sizes are both located in the same CEPH_FSCRYPT_BLOCK_SIZE block the second one will skip invalidating the corresponding pagecaches in kclient.
Updated by Xiubo Li about 1 year ago
- Status changed from New to Fix Under Review
Updated by Xiubo Li about 1 year ago
Xiubo Li wrote:
Fixed it in: https://patchwork.kernel.org/project/ceph-devel/patch/20230307080714.122306-1-xiubli@redhat.com/
Locally test this for more than 100 times, didn't see it with the kclient fix:
[root@lxbceph1 xfstests-dev]# ./check generic/075 generic/112 FSTYP -- ceph PLATFORM -- Linux/x86_64 lxbceph1 6.2.0-00075-g058d472b4402-dirty #224 SMP PREEMPT_DYNAMIC Mon Mar 6 17:09:18 CST 2023 MKFS_OPTIONS -- 10.72.47.117:40107:/testB MOUNT_OPTIONS -- -o test_dummy_encryption,name=admin,nowsync,copyfrom,rasize=4096 10.72.47.117:40107:/testB /mnt/kcephfs.B generic/075 40s ... 44s generic/112 41s ... 41s Ran: generic/075 generic/112 Passed all 2 tests
Actions