Project

General

Profile

Actions

Bug #58834

open

kclient: xfstests-dev generic/075 failed

Added by Xiubo Li about 1 year ago. Updated about 1 year ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/xiubli-2023-02-21_02:17:23-fs:fscrypt-wip-fscrypt-20230216-1802-distro-default-smithi/7182255/teuthology.log

generic/075       [failed, exit status 1]- output mismatch (see /tmp/tmp.PzLBGAOIcXxfstests-dev/results//generic/075.out.bad)
    --- tests/generic/075.out   2023-02-21 02:51:57.079244980 +0000
    +++ /tmp/tmp.PzLBGAOIcXxfstests-dev/results//generic/075.out.bad    2023-02-21 03:05:46.103612075 +0000
    @@ -4,15 +4,5 @@
     -----------------------------------------------
     fsx.0 : -d -N numops -S 0
     -----------------------------------------------
    -
    ------------------------------------------------
    -fsx.1 : -d -N numops -S 0 -x
    ------------------------------------------------
    ...
    (Run 'diff -u /tmp/tmp.PzLBGAOIcXxfstests-dev/tests/generic/075.out /tmp/tmp.PzLBGAOIcXxfstests-dev/results//generic/075.out.bad'  to see the entire diff)

Only saw this once.

Actions #1

Updated by Xiubo Li about 1 year ago

http://qa-proxy.ceph.com/teuthology/xiubli-2023-02-23_03:24:22-fs:fscrypt-wip-fscrypt-20230215-0834-distro-default-smithi/7185216/teuthology.log

generic/112       [failed, exit status 1]- output mismatch (see /tmp/tmp.QUxmoUObcrxfstests-dev/results//generic/112.out.bad)
    --- tests/generic/112.out    2023-02-23 03:49:18.580164190 +0000
    +++ /tmp/tmp.QUxmoUObcrxfstests-dev/results//generic/112.out.bad    2023-02-23 04:13:59.529829597 +0000
    @@ -4,15 +4,4 @@
     -----------------------------------------------
     fsx.0 : -A -d -N numops -S 0
     -----------------------------------------------
    -
    ------------------------------------------------
    -fsx.1 : -A -d -N numops -S 0 -x
    ------------------------------------------------
    ...
    (Run 'diff -u /tmp/tmp.QUxmoUObcrxfstests-dev/tests/generic/112.out /tmp/tmp.QUxmoUObcrxfstests-dev/results//generic/112.out.bad'  to see the entire diff)
generic/113        131s
Actions #2

Updated by Xiubo Li about 1 year ago

A more detailed debug logs:

http://qa-proxy.ceph.com/teuthology/xiubli-2023-03-01_01:03:16-fs:fscrypt-wip-fscrypt-20230228-1026-distro-default-smithi/7189793/teuthology.log

2023-03-01T01:37:45.435 INFO:tasks.cephfs.test_fscrypt:Command stdout -
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 smithi047 6.2.0-ceph-g4d47d81cd6a6 #1 SMP PREEMPT_DYNAMIC Tue Feb 28 13:46:13 UTC 2023
MKFS_OPTIONS  -- 172.21.15.12:6789:/scratch
MOUNT_OPTIONS -- -o name=admin,secret=AQCLqv5ju9isIxAAW9waXCZCxCle15GJmhMWdA==,test_dummy_encryption -o context=system_u:object_r:root_t:s0 172.21.15.12:6789:/scratch /tmp/tmp.bg0HVqUGdoscratch

generic/075       [failed, exit status 1]- output mismatch (see /tmp/tmp.a8dwrLqOIvxfstests-dev/results//generic/075.out.bad)
    --- tests/generic/075.out    2023-03-01 01:31:44.916121724 +0000
    +++ /tmp/tmp.a8dwrLqOIvxfstests-dev/results//generic/075.out.bad    2023-03-01 01:37:22.632170100 +0000
    @@ -4,15 +4,5 @@
     -----------------------------------------------
     fsx.0 : -d -N numops -S 0
     -----------------------------------------------
    -
    ------------------------------------------------
    -fsx.1 : -d -N numops -S 0 -x
    ------------------------------------------------
    -
    ------------------------------------------------
    -fsx.2 : -d -N numops -l filelen -S 0
    ------------------------------------------------
    -
    ------------------------------------------------
    -fsx.3 : -d -N numops -l filelen -S 0 -x
    ------------------------------------------------
    +    fsx (-d -N 1000 -S 0) failed, 0 - compare /tmp/tmp.a8dwrLqOIvxfstests-dev/results//generic/075.0.{good,bad,fsxlog}
    +mv: '/tmp/tmp.a8dwrLqOIvxfstests-dev/results//generic/075.0.fsxlog' and '/tmp/tmp.a8dwrLqOIvxfstests-dev/results//generic/075.0.fsxlog' are the same file
generic/112        23s
Ran: generic/075 generic/112
Failures: generic/075
Failed 1 of 2 tests
Actions #3

Updated by Xiubo Li about 1 year ago

Reproduced it locally:

[root@lxbceph1 xfstests-dev]# ./check generic/075 
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 lxbceph1 6.2.0-00462-g015be187259d-dirty #215 SMP PREEMPT_DYNAMIC Tue Feb 28 21:50:43 CST 2023
MKFS_OPTIONS  -- 10.72.47.117:40262:/testB
MOUNT_OPTIONS -- -o test_dummy_encryption,name=admin,nowsync,copyfrom,rasize=4096 -o context=system_u:object_r:root_t:s0 10.72.47.117:40262:/testB /mnt/kcephfs.B

generic/075 46s ... [failed, exit status 1]- output mismatch (see /data/xfstests-dev/results//generic/075.out.bad)
    --- tests/generic/075.out    2022-04-07 21:42:09.618007240 +0800
    +++ /data/xfstests-dev/results//generic/075.out.bad    2023-03-01 14:49:25.641633437 +0800
    @@ -4,15 +4,127 @@
     -----------------------------------------------
     fsx.0 : -d -N numops -S 0
     -----------------------------------------------
    -
    ------------------------------------------------
    -fsx.1 : -d -N numops -S 0 -x
    ------------------------------------------------
    -
    ------------------------------------------------
    -fsx.2 : -d -N numops -l filelen -S 0
    ------------------------------------------------
    -
    ------------------------------------------------
    -fsx.3 : -d -N numops -l filelen -S 0 -x
    ------------------------------------------------
    +Seed set to 277337
    +lxb-------------main:2877
    +fsx: main:2909
    +: No such file or directory
    +fsx: main:2963
    +: No such file or directory
    +fsx: main:2977
    +: No such file or directory
    +main: filesystem does not support fallocate mode 0, disabling!
    +main: filesystem does not support fallocate mode FALLOC_FL_KEEP_SIZE, disabling!
    +main: filesystem does not support fallocate mode FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, disabling!
    +main: filesystem does not support fallocate mode FALLOC_FL_ZERO_RANGE, disabling!
    +main: filesystem does not support fallocate mode FALLOC_FL_COLLAPSE_RANGE, disabling!
    +main: filesystem does not support fallocate mode FALLOC_FL_INSERT_RANGE, disabling!
    +main: filesystem does not support clone range, disabling!
    +main: filesystem does not support dedupe range, disabling!
    +truncating to largest ever: 0x18697
    +4 trunc    from 0x0 to 0x18697
    +5 read    0x7ec5 thru    0x11a9f    (0x9bdb bytes)
    +7 mapread    0x70b9 thru    0x88ec    (0x1834 bytes)
    +10 mapwrite    0x77df thru    0xb16c    (0x398e bytes)
    +11 trunc    from 0x18697 to 0x4cbc
    +13 mapwrite    0x3b24 thru    0x8abc    (0x4f99 bytes)
    +15 read    0x1ae3 thru    0x8abc    (0x6fda bytes)
    +16 trunc    from 0x8abd to 0xb032
    +copying to largest ever: 0x2d08e
    +17 copy    from 0xaa32 to 0xb032, (0x600 bytes) at 0x2ca8e
    +21 read    0x2649f thru    0x2d08d    (0x6bef bytes)
    +24 mapread    0x21e6a thru    0x2d08d    (0xb224 bytes)
    +26 mapread    0x11c76 thru    0x18093    (0x641e bytes)
    +27 mapwrite    0x537 thru    0x99c4    (0x948e bytes)
    +30 mapwrite    0x2c090 thru    0x35b42    (0x9ab3 bytes)
    +31 copy    from 0x6e7a to 0xf76e, (0x88f4 bytes) at 0x24093
    +copying to largest ever: 0x32c05
    +32 copy    from 0x1f995 to 0x26af6, (0x7161 bytes) at 0x2baa4
    +35 read    0x1b53a thru    0x1b629    (0xf0 bytes)
    +37 trunc    from 0x35b43 to 0x44b4
    +38 copy    from 0x116f to 0x44b4, (0x3345 bytes) at 0x20547
    +41 mapread    0x1f3be thru    0x2388b    (0x44ce bytes)
    +43 copy    from 0x304c to 0xff96, (0xcf4a bytes) at 0x1af3b
    +44 read    0x3cbc thru    0xd016    (0x935b bytes)
    +READ BAD DATA: offset = 0x3cbc, size = 0x935b, fname = 075.0
    +OFFSET    GOOD    BAD    RANGE
    +0x044b4    0x0000    0x1bea    0x00000
    +operation# (mod 256) for the bad data may be 27
    +0x044b5    0x0000    0xea1b    0x00001
    +operation# (mod 256) for the bad data may be 27
    +0x044b6    0x0000    0x1b99    0x00002
    +operation# (mod 256) for the bad data may be 27
    +0x044b7    0x0000    0x991b    0x00003
    +operation# (mod 256) for the bad data may be 27
    +0x044b8    0x0000    0x1b67    0x00004
    +operation# (mod 256) for the bad data may be 27
    +0x044b9    0x0000    0x671b    0x00005
    +operation# (mod 256) for the bad data may be 27
    +0x044ba    0x0000    0x1b18    0x00006
    +operation# (mod 256) for the bad data may be 27
    +0x044bb    0x0000    0x181b    0x00007
    +operation# (mod 256) for the bad data may be 27
    +0x044bc    0x0000    0x1b92    0x00008
    +operation# (mod 256) for the bad data may be 27
    +0x044bd    0x0000    0x921b    0x00009
    +operation# (mod 256) for the bad data may be 27
    +0x044be    0x0000    0x1bf4    0x0000a
    +operation# (mod 256) for the bad data may be 27
    +0x044bf    0x0000    0xf41b    0x0000b
    +operation# (mod 256) for the bad data may be 27
    +0x044c0    0x0000    0x1b05    0x0000c
    +operation# (mod 256) for the bad data may be 27
    +0x044c1    0x0000    0x051b    0x0000d
    +operation# (mod 256) for the bad data may be 27
    +0x044c2    0x0000    0x1b55    0x0000e
    +operation# (mod 256) for the bad data may be 27
    +0x044c3    0x0000    0x551b    0x0000f
    +operation# (mod 256) for the bad data may be 27
    +LOG DUMP (44 total operations):
    +1(  1 mod 256): SKIPPED (no operation)
    +2(  2 mod 256): SKIPPED (no operation)
    +3(  3 mod 256): SKIPPED (no operation)
    +4(  4 mod 256): TRUNCATE UP    from 0x0 to 0x18697    ******WWWW
    +5(  5 mod 256): READ     0x7ec5 thru 0x11a9f    (0x9bdb bytes)
    +6(  6 mod 256): SKIPPED (no operation)
    +7(  7 mod 256): MAPREAD  0x70b9 thru 0x88ec    (0x1834 bytes)
    +8(  8 mod 256): SKIPPED (no operation)
    +9(  9 mod 256): SKIPPED (no operation)
    +10( 10 mod 256): MAPWRITE 0x77df thru 0xb16c    (0x398e bytes)
    +11( 11 mod 256): TRUNCATE DOWN    from 0x18697 to 0x4cbc
    +12( 12 mod 256): SKIPPED (no operation)
    +13( 13 mod 256): MAPWRITE 0x3b24 thru 0x8abc    (0x4f99 bytes)    ******WWWW
    +14( 14 mod 256): SKIPPED (no operation)
    +15( 15 mod 256): READ     0x1ae3 thru 0x8abc    (0x6fda bytes)    ***RRRR***
    +16( 16 mod 256): TRUNCATE UP    from 0x8abd to 0xb032
    +17( 17 mod 256): COPY 0xaa32 thru 0xb031    (0x600 bytes) to 0x2ca8e thru 0x2d08d
    +18( 18 mod 256): SKIPPED (no operation)
    +19( 19 mod 256): SKIPPED (no operation)
    +20( 20 mod 256): SKIPPED (no operation)
    +21( 21 mod 256): READ     0x2649f thru 0x2d08d    (0x6bef bytes)
    +22( 22 mod 256): SKIPPED (no operation)
    +23( 23 mod 256): SKIPPED (no operation)
    +24( 24 mod 256): MAPREAD  0x21e6a thru 0x2d08d    (0xb224 bytes)
    +25( 25 mod 256): SKIPPED (no operation)
    +26( 26 mod 256): MAPREAD  0x11c76 thru 0x18093    (0x641e bytes)
    +27( 27 mod 256): MAPWRITE 0x537 thru 0x99c4    (0x948e bytes)    ******WWWW
    +28( 28 mod 256): SKIPPED (no operation)
    +29( 29 mod 256): SKIPPED (no operation)
    +30( 30 mod 256): MAPWRITE 0x2c090 thru 0x35b42    (0x9ab3 bytes)
    +31( 31 mod 256): COPY 0x6e7a thru 0xf76d    (0x88f4 bytes) to 0x24093 thru 0x2c986
    +32( 32 mod 256): COPY 0x1f995 thru 0x26af5    (0x7161 bytes) to 0x2baa4 thru 0x32c04
    +33( 33 mod 256): SKIPPED (no operation)
    +34( 34 mod 256): SKIPPED (no operation)
    +35( 35 mod 256): READ     0x1b53a thru 0x1b629    (0xf0 bytes)
    +36( 36 mod 256): SKIPPED (no operation)
    +37( 37 mod 256): TRUNCATE DOWN    from 0x35b43 to 0x44b4    ******WWWW
    +38( 38 mod 256): COPY 0x116f thru 0x44b3    (0x3345 bytes) to 0x20547 thru 0x2388b
    +39( 39 mod 256): SKIPPED (no operation)
    +40( 40 mod 256): SKIPPED (no operation)
    +41( 41 mod 256): MAPREAD  0x1f3be thru 0x2388b    (0x44ce bytes)
    +42( 42 mod 256): SKIPPED (no operation)
    +43( 43 mod 256): COPY 0x304c thru 0xff95    (0xcf4a bytes) to 0x1af3b thru 0x27e84    EEEE******
    +44( 44 mod 256): READ     0x3cbc thru 0xd016    (0x935b bytes)    ***RRRR***
    +Log of operations saved to "/data/xfstests-dev/results//generic/075.0.fsxops"; replay with --replay-ops
    +Correct content saved for comparison
    +(maybe hexdump "075.0" vs "/data/xfstests-dev/results//generic/075.0.fsxgood")
    +    fsx (-d -N 1000 -S 0) failed, 110 - compare /data/xfstests-dev/results//generic/075.0.{good,bad,fsxlog}
Ran: generic/075
Failures: generic/075
Failed 1 of 1 tests
Actions #4

Updated by Xiubo Li about 1 year ago

It's the TRUNCATE DOWN make the data corrupted:

 +37( 37 mod 256): TRUNCATE DOWN    from 0x35b43 to 0x44b4    ******WWWW
1013 004450 e31b b51b 0c1b 4f1b ec1b b21b 3b1b 7c1b                        |1013 004450 e31b b51b 0c1b 4f1b ec1b b21b 3b1b 7c1b                        
1014 004460 ea1b 861b f11b b01b a01b ac1b d61b 651b                        |1014 004460 ea1b 861b f11b b01b a01b ac1b d61b 651b                        
1015 004470 381b 541b 671b 3e1b b31b 2b1b f41b d81b                        |1015 004470 381b 541b 671b 3e1b b31b 2b1b f41b d81b                        
1016 004480 7b1b a61b 0e1b 3f1b 241b 711b ee1b 9d1b                        |1016 004480 7b1b a61b 0e1b 3f1b 241b 711b ee1b 9d1b                        
1017 004490 be1b 9d1b 121b 591b d81b 911b 771b 621b                        |1017 004490 be1b 9d1b 121b 591b d81b 911b 771b 621b                        
1018 0044a0 971b cc1b 2c1b 841b 531b d71b aa1b 461b                        |1018 0044a0 971b cc1b 2c1b 841b 531b d71b aa1b 461b                        
1019 0044b0 971b 301b ea1b 991b 671b 181b 921b f41b                        |1019 0044b0 971b 301b 0000 0000 0000 0000 0000 0000                        
1020 0044c0 051b 551b fc1b 301b 961b 801b be1b 3d1b                        |1020 0044c0 0000 0000 0000 0000 0000 0000 0000 0000                        
1021 0044d0 a21b eb1b 751b 241b 761b 861b 5e1b 0f1b                        |1021 *                                                                     
1022 0044e0 4d1b 251b 721b b31b f51b 6e1b 481b 921b                        |1022 01af30 0000 0000 0000 0000 0000 1b00 1b5c 1b61
1023 0044f0 1b1b ce1b 821b 4a1b 7e1b d91b 321b ca1b                        |1023 01af40 1bb0 1bc7 1b67 1b46 1bee 1bc3 1b86 1ba3                        
/data/075.1/075.0.bad, , unix   ASCII=57,HEX=0x39   1023,34  63%            /data/075.1/075.0.good, , unix   ASCII=48,HEX=0x30   1023,1  75%           

Actions #5

Updated by Xiubo Li about 1 year ago

It's racy between two size truncate operations when the new sizes are both located in the same CEPH_FSCRYPT_BLOCK_SIZE block the second one will skip invalidating the corresponding pagecaches in kclient.

Actions #6

Updated by Xiubo Li about 1 year ago

  • Status changed from New to Fix Under Review
Actions #7

Updated by Xiubo Li about 1 year ago

Xiubo Li wrote:

Fixed it in: https://patchwork.kernel.org/project/ceph-devel/patch/20230307080714.122306-1-xiubli@redhat.com/

Locally test this for more than 100 times, didn't see it with the kclient fix:

[root@lxbceph1 xfstests-dev]# ./check generic/075 generic/112
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 lxbceph1 6.2.0-00075-g058d472b4402-dirty #224 SMP PREEMPT_DYNAMIC Mon Mar  6 17:09:18 CST 2023
MKFS_OPTIONS  -- 10.72.47.117:40107:/testB
MOUNT_OPTIONS -- -o test_dummy_encryption,name=admin,nowsync,copyfrom,rasize=4096 10.72.47.117:40107:/testB /mnt/kcephfs.B

generic/075 40s ...  44s
generic/112 41s ...  41s
Ran: generic/075 generic/112
Passed all 2 tests

Actions

Also available in: Atom PDF