Bug #56695
closed[RHEL stock] pjd test failures(a bug that need to wait the unlink to finish)
0%
Description
pdonnell@teuthology /ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi$ grep 'not ok' {'6945847','6945828','6945853','6945837','6945822','6945804','6945862','6945812'}/teu* 6945847/teuthology.log:2022-07-22T20:52:25.833 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo 'not ok 4' 6945847/teuthology.log:2022-07-22T20:52:25.838 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo 'not ok 6' 6945847/teuthology.log:2022-07-22T20:52:25.851 INFO:tasks.workunit.client.0.smithi135.stdout:not ok 4 6945847/teuthology.log:2022-07-22T20:52:25.852 INFO:tasks.workunit.client.0.smithi135.stdout:not ok 6 6945847/teuthology.log:2022-07-22T20:52:26.141 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo 'not ok 8' 6945847/teuthology.log:2022-07-22T20:52:26.143 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo 'not ok 9' 6945847/teuthology.log:2022-07-22T20:52:26.175 INFO:tasks.workunit.client.0.smithi135.stdout:not ok 8 6945847/teuthology.log:2022-07-22T20:52:26.175 INFO:tasks.workunit.client.0.smithi135.stdout:not ok 9 6945828/teuthology.log:2022-07-22T20:32:24.810 INFO:tasks.workunit.client.0.smithi100.stderr:+ echo 'not ok 4' 6945828/teuthology.log:2022-07-22T20:32:24.815 INFO:tasks.workunit.client.0.smithi100.stderr:+ echo 'not ok 6' 6945828/teuthology.log:2022-07-22T20:32:24.829 INFO:tasks.workunit.client.0.smithi100.stdout:not ok 4 6945828/teuthology.log:2022-07-22T20:32:24.829 INFO:tasks.workunit.client.0.smithi100.stdout:not ok 6 6945828/teuthology.log:2022-07-22T20:32:25.129 INFO:tasks.workunit.client.0.smithi100.stderr:+ echo 'not ok 8' 6945828/teuthology.log:2022-07-22T20:32:25.132 INFO:tasks.workunit.client.0.smithi100.stderr:+ echo 'not ok 9' 6945828/teuthology.log:2022-07-22T20:32:25.162 INFO:tasks.workunit.client.0.smithi100.stdout:not ok 8 6945828/teuthology.log:2022-07-22T20:32:25.162 INFO:tasks.workunit.client.0.smithi100.stdout:not ok 9 6945853/teuthology.log:2022-07-22T21:04:37.135 INFO:tasks.workunit.client.0.smithi107.stderr:+ echo 'not ok 4' 6945853/teuthology.log:2022-07-22T21:04:37.141 INFO:tasks.workunit.client.0.smithi107.stderr:+ echo 'not ok 6' 6945853/teuthology.log:2022-07-22T21:04:37.154 INFO:tasks.workunit.client.0.smithi107.stdout:not ok 4 6945853/teuthology.log:2022-07-22T21:04:37.155 INFO:tasks.workunit.client.0.smithi107.stdout:not ok 6 6945853/teuthology.log:2022-07-22T21:04:37.453 INFO:tasks.workunit.client.0.smithi107.stderr:+ echo 'not ok 8' 6945853/teuthology.log:2022-07-22T21:04:37.456 INFO:tasks.workunit.client.0.smithi107.stderr:+ echo 'not ok 9' 6945853/teuthology.log:2022-07-22T21:04:37.481 INFO:tasks.workunit.client.0.smithi107.stdout:not ok 8 6945853/teuthology.log:2022-07-22T21:04:37.481 INFO:tasks.workunit.client.0.smithi107.stdout:not ok 9 6945837/teuthology.log:2022-07-22T20:37:53.954 INFO:tasks.workunit.client.0.smithi017.stderr:+ echo 'not ok 8' 6945837/teuthology.log:2022-07-22T20:37:53.960 INFO:tasks.workunit.client.0.smithi017.stderr:+ echo 'not ok 9' 6945837/teuthology.log:2022-07-22T20:37:53.987 INFO:tasks.workunit.client.0.smithi017.stdout:not ok 8 6945837/teuthology.log:2022-07-22T20:37:53.987 INFO:tasks.workunit.client.0.smithi017.stdout:not ok 9 6945822/teuthology.log:2022-07-22T20:25:33.552 INFO:tasks.workunit.client.0.smithi029.stderr:+ echo 'not ok 8' 6945822/teuthology.log:2022-07-22T20:25:33.555 INFO:tasks.workunit.client.0.smithi029.stderr:+ echo 'not ok 9' 6945822/teuthology.log:2022-07-22T20:25:33.587 INFO:tasks.workunit.client.0.smithi029.stdout:not ok 8 6945822/teuthology.log:2022-07-22T20:25:33.587 INFO:tasks.workunit.client.0.smithi029.stdout:not ok 9 6945804/teuthology.log:2022-07-22T20:13:29.326 INFO:tasks.workunit.client.0.smithi006.stderr:+ echo 'not ok 4' 6945804/teuthology.log:2022-07-22T20:13:29.331 INFO:tasks.workunit.client.0.smithi006.stderr:+ echo 'not ok 6' 6945804/teuthology.log:2022-07-22T20:13:29.386 INFO:tasks.workunit.client.0.smithi006.stdout:not ok 4 6945804/teuthology.log:2022-07-22T20:13:29.387 INFO:tasks.workunit.client.0.smithi006.stdout:not ok 6 6945804/teuthology.log:2022-07-22T20:13:29.684 INFO:tasks.workunit.client.0.smithi006.stderr:+ echo 'not ok 8' 6945804/teuthology.log:2022-07-22T20:13:29.687 INFO:tasks.workunit.client.0.smithi006.stderr:+ echo 'not ok 9' 6945804/teuthology.log:2022-07-22T20:13:29.714 INFO:tasks.workunit.client.0.smithi006.stdout:not ok 8 6945804/teuthology.log:2022-07-22T20:13:29.714 INFO:tasks.workunit.client.0.smithi006.stdout:not ok 9 6945862/teuthology.log:2022-07-22T21:14:09.659 INFO:tasks.workunit.client.0.smithi027.stderr:+ echo 'not ok 8' 6945862/teuthology.log:2022-07-22T21:14:09.663 INFO:tasks.workunit.client.0.smithi027.stderr:+ echo 'not ok 9' 6945862/teuthology.log:2022-07-22T21:14:09.710 INFO:tasks.workunit.client.0.smithi027.stdout:not ok 8 6945862/teuthology.log:2022-07-22T21:14:09.710 INFO:tasks.workunit.client.0.smithi027.stdout:not ok 9 6945812/teuthology.log:2022-07-22T20:19:02.363 INFO:tasks.workunit.client.0.smithi064.stderr:+ echo 'not ok 4' 6945812/teuthology.log:2022-07-22T20:19:02.376 INFO:tasks.workunit.client.0.smithi064.stderr:+ echo 'not ok 6' 6945812/teuthology.log:2022-07-22T20:19:02.389 INFO:tasks.workunit.client.0.smithi064.stdout:not ok 4 6945812/teuthology.log:2022-07-22T20:19:02.390 INFO:tasks.workunit.client.0.smithi064.stdout:not ok 6 6945812/teuthology.log:2022-07-22T20:19:02.689 INFO:tasks.workunit.client.0.smithi064.stderr:+ echo 'not ok 8' 6945812/teuthology.log:2022-07-22T20:19:02.692 INFO:tasks.workunit.client.0.smithi064.stderr:+ echo 'not ok 9' 6945812/teuthology.log:2022-07-22T20:19:06.580 INFO:tasks.workunit.client.0.smithi064.stdout:not ok 8 6945812/teuthology.log:2022-07-22T20:19:06.581 INFO:tasks.workunit.client.0.smithi064.stdout:not ok 9
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945847/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945828/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945853/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945837/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945822/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945804/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945862/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945812/teuthology.log
Updated by Patrick Donnelly almost 2 years ago
- Related to Bug #55804: qa failure: pjd link tests failed added
Updated by Xiubo Li over 1 year ago
- Status changed from New to In Progress
Currently the kclient's testing branch has merged the fscryption name related patches, which will limit the MAX_NAME to 240 instead of the default 256.
Will found one way to fix it.
Updated by Xiubo Li over 1 year ago
Xiubo Li wrote:
Currently the kclient's testing branch has merged the fscryption name related patches, which will limit the MAX_NAME to 240 instead of the default 256.
Will found one way to fix it.
Sorry, it seems not introduced by this:
Checked the first teuthology test link:
2022-07-22T20:31:40.242 INFO:teuthology.orchestra.run.smithi135.stdout:Package kernel-4.18.0-372.9.1.el8.x86_64 is already installed.
It was using the distro kernel.
Updated by Xiubo Li over 1 year ago
Test this with the latest testing kclient branch, I couldn't reproduce it.
Will switch to use the distro kernel and have a try.
Updated by Xiubo Li over 1 year ago
Tried 4.18.0-348.20.1.el8_5.x86_64 and couldn't reproduce it.
Will try the exact same version of kernel-4.18.0-372.9.1.el8.x86_64 with the qa test failures.
Updated by Xiubo Li over 1 year ago
Xiubo Li wrote:
Tried 4.18.0-348.20.1.el8_5.x86_64 and couldn't reproduce it.
Will try the exact same version of kernel-4.18.0-372.9.1.el8.x86_64 with the qa test failures.
Still the same, couldn't reproduce it too.
Updated by Xiubo Li over 1 year ago
- Project changed from Linux kernel client to CephFS
- Component(FS) MDS added
Patrick Donnelly wrote:
[...]
/ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945847/teuthology.log
It's a MDS side bug, not kclient's:
Such as for the following error, the errno is EXDEV:
2022-07-22T20:52:25.830 INFO:tasks.workunit.client.0.smithi135.stderr:++ expr 3 + 1 2022-07-22T20:52:25.831 INFO:tasks.workunit.client.0.smithi135.stderr:+ ntest=4 2022-07-22T20:52:25.831 INFO:tasks.workunit.client.0.smithi135.stderr:+ expect 0 link fstest_d22bbf65b5ea67dbd4f2534c56018c49 _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 2022-07-22T20:52:25.831 INFO:tasks.workunit.client.0.smithi135.stderr:+ e=0 2022-07-22T20:52:25.831 INFO:tasks.workunit.client.0.smithi135.stderr:+ shift 2022-07-22T20:52:25.831 INFO:tasks.workunit.client.0.smithi135.stderr:++ /home/ubuntu/cephtest/mnt.0/client.0/tmp/tmp/../pjd-fstest-20090130-RC/tests/link/../../fstest link fstest_d22bbf65b5ea67dbd4f2534c56018c49 _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 2022-07-22T20:52:25.832 INFO:tasks.workunit.client.0.smithi135.stderr:++ tail -1 2022-07-22T20:52:25.832 INFO:tasks.workunit.client.0.smithi135.stderr:+ r=EXDEV 2022-07-22T20:52:25.832 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo EXDEV 2022-07-22T20:52:25.832 INFO:tasks.workunit.client.0.smithi135.stderr:+ egrep '^0$' 2022-07-22T20:52:25.833 INFO:tasks.workunit.client.0.smithi135.stderr:+ '[' 1 -eq 0 ']' 2022-07-22T20:52:25.833 INFO:tasks.workunit.client.0.smithi135.stderr:+ echo 'not ok 4'
And in the remote/smithi141/log/a190153a-09fe-11ed-842f-001a4aab830c/ceph-mds.b.log.gz, before the pjd test #4 in test #3 it will unlink the file _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234:
teuthology logs:
2022-07-22T20:52:25.828 INFO:tasks.workunit.client.0.smithi135.stderr:+ ntest=3 2022-07-22T20:52:25.828 INFO:tasks.workunit.client.0.smithi135.stderr:+ expect 0 unlink _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234
mds.b logs:
2022-07-22T20:52:25.348+0000 7fe2bf6d4700 1 -- [v2:172.21.15.141:6838/1279389197,v1:172.21.15.141:6839/1279389197] <== client.24956 172.21.15.135:0/377103332 1199 ==== client_request(client.24956:592 unlink #0x10000000220/_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 2022-07-22T20:52:25.348771+0000 caller_uid=0, caller_gid=0{0,}) v4 ==== 443+0+0 (crc 0 0 0) 0x55b102d34b00 con 0x55b10295e400 ... 2022-07-22T20:52:25.348+0000 7fe2bf6d4700 10 mds.0.locker process_request_cap_release client.24956 pAsXsFscr on [inode 0x100000000d3 [2,head] /volumes/_nogroup/sv_1/6ae107c1-dbbf-417a-8a51-99df917679be/client.0/tmp/tmp/_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 auth v455 pv458 ap=3 s=0 n(v0 1=1+0)->n(v0 rc2022-07-22T20:52:25.343772+0000 1=1+0) (ilink xlockdone x=1) (iversion lock w=1 last_client=24956) caps={24956=pAsLsXsFscr/pFscr@3},l=24956 | request=2 lock=3 caps=1 authpin=1 0x55b103050000]
And after that the inode 0x100000000d3 will go into the stray directory:
2022-07-22T20:52:25.349+0000 7fe2bf6d4700 10 mds.0.locker got xlock on (dn xlock x=1 by 0x55b1028aec00) [dentry #0x100/stray1/100000000d3 [2,head] auth NULL (dn xlock x=1 by 0x55b1028aec00) (dversion lock) pv=0 v=492 ap=2 ino=(nil) state=1342177281|new | request=1 lock=1 authpin=1 0x55b103059b80]
So when the pjd test #4 began later:
2022-07-22T20:52:25.353+0000 7fe2bf6d4700 1 -- [v2:172.21.15.141:6838/1279389197,v1:172.21.15.141:6839/1279389197] <== client.24956 172.21.15.135:0/377103332 1200 ==== client_request(client.24956:593 link #0x10000000220/_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 #0x10000000220/fstest_d22bbf65b5ea67dbd4f2534c56018c49 2022-07-22T20:52:25.353771+0000 caller_uid=0, caller_gid=0{0,}) v4 ==== 438+0+0 (crc 0 0 0) 0x55b102b4d8c0 con 0x55b10295e400
And when traversing the path2, it will find the it in the stray directory:
2022-07-22T20:52:25.354+0000 7fe2bf6d4700 7 mds.0.cache traverse: opening base ino 0x10000000220 snap head 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 10 mds.0.locker try_rdlock_snap_layout request(client.24956:593 nref=3 cr=0x55b102b4d8c0) [inode 0x10000000220 [...2,head] /volumes/_nogroup/sv_1/6ae107c1-dbbf-417a-8a51-99df917679be/client.0/tmp/tmp/ auth{1=1,2=1,3=1,4=1} v199 pv209 ap=6 DIRTYPARENT f(v0 m2022-07-22T20:52:25.274773+0000 1=0+1)->f(v0 m2022-07-22T20:52:25.348771+0000 1=1+0) n(v3 rc2022-07-22T20:52:25.274773+0000 2=0+2)->n(v3 rc2022-07-22T20:52:25.348771+0000 1=0+1) (iauth sync r=1) (isnap sync r=5) (inest mix w=4 dirty) (ipolicy sync r=1) (ifile excl w=4) (iversion lock w=4 last_client=24956) caps={24956=pAsLsXsFsxcr/pAsLsXsFsxcral@52},l=24956 | dirtyscattered=1 request=0 lock=6 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 replicated=1 dirty=1 authpin=1 scrubqueue=0 0x55b1030a2c00] 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 12 mds.0.cache traverse: path seg depth 0 'fstest_d22bbf65b5ea67dbd4f2534c56018c49' snapid head 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 20 mds.0.cache.dir(0x10000000220) lookup (fstest_d22bbf65b5ea67dbd4f2534c56018c49, 'head') 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 20 mds.0.cache.dir(0x10000000220) hit -> (fstest_d22bbf65b5ea67dbd4f2534c56018c49,head) 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 7 mds.0.cache linking in remote in [inode 0x100000000d3 [2,head] ~mds0/stray1/100000000d3 auth v455 pv493 ap=4 s=0 n(v0 1=1+0)->n(v0 rc2022-07-22T20:52:25.348771+0000 1=1+0) (ilink xlockdone x=2) (iversion lock w=2 last_client=24956) caps={24956=pAsXsFscr/pFscr@3},l=24956 | request=3 lock=5 caps=1 authpin=1 0x55b103050000]
And then it assume the two paths were in two different volumes and then fail it with EXDEV:
2022-07-22T20:52:25.354+0000 7fe2bf6d4700 7 mds.0.server target is in different subvolume, failing... 2022-07-22T20:52:25.354+0000 7fe2bf6d4700 7 mds.0.server reply_client_request -18 ((18) Invalid cross-device link) client_request(client.24956:593 link #0x10000000220/_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 #0x10000000220/fstest_d22bbf65b5ea67dbd4f2534c56018c49 2022-07-22T20:52:25.353771+0000 caller_uid=0, caller_gid=0{0,}) v4
Updated by Xiubo Li over 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 47399
Updated by Xiubo Li about 1 year ago
- Subject changed from [RHEL stock] pjd test failures to [RHEL stock] pjd test failures(a bug that need to wait the unlink to finish)
Updated by Venky Shankar about 1 year ago
- Category set to Correctness/Safety
- Status changed from Fix Under Review to Pending Backport
- Target version set to v19.0.0
- Backport changed from quincy,pacific to reef,quincy,pacific
Updated by Backport Bot about 1 year ago
- Copied to Backport #59384: reef: [RHEL stock] pjd test failures(a bug that need to wait the unlink to finish) added
Updated by Backport Bot about 1 year ago
- Copied to Backport #59385: quincy: [RHEL stock] pjd test failures(a bug that need to wait the unlink to finish) added
Updated by Backport Bot about 1 year ago
- Copied to Backport #59386: pacific: [RHEL stock] pjd test failures(a bug that need to wait the unlink to finish) added
Updated by Kotresh Hiremath Ravishankar 11 months ago
reef backport is not merged yet. Putting this for record.
Updated by Patrick Donnelly 8 months ago
- Related to Bug #58340: mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename)) added
Updated by Patrick Donnelly 8 months ago
- Related to Bug #61818: mds: deadlock between unlink and linkmerge added