More detail please see https://tracker.ceph.com/issues/55240#note-4:
Another issue in this failure:
In mds.1 after it find the inode for #0x1/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/fsstress/fsstress *, it will retry the *getarrt client request:
2022-04-07T06:34:55.052+0000 7fe6f18dc700 7 mds.1.server dispatch_client_request client_request(client.14520:12001 getattr Fsr #0x1000000615a 2022-04-07T06:34:53.430871+0000 caller_uid=1000, caller_gid=1258{6,36,1000,1258,}) v4
2022-04-07T06:34:55.052+0000 7fe6f18dc700 10 mds.1.server rdlock_path_pin_ref request(client.14520:12001 nref=3 cr=0x55c755025b80) #0x1000000615a
2022-04-07T06:34:55.052+0000 7fe6f18dc700 10 mds.1.locker acquire_locks request(client.14520:12001 nref=3 cr=0x55c755025b80)
But it immediately stuck waiting for rdlock, because it's lock state in replica mds.1 is LOCK:
2022-04-07T06:34:55.052+0000 7fe6f18dc700 10 mds.1.locker requesting rdlock from auth on (ifile lock) on [inode 0x1000000615a [2,head] /client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/fsstress/fsstress rep@0.1 v18 s=0 n(v0 1=1+0) (iauth lock) (ifile lock) (ixattr lock) (iversion lock) cr={14520=0-4194304@1} | request=1 lock=1 0x55c75521cc00]
The source code for the LOCK:
// stable loner rep state r rp rd wr fwr l x caps(any,loner,xlocker,replica)
[LOCK_LOCK] = { 0, false, LOCK_LOCK, AUTH, 0, REQ, AUTH,0, 0, 0, CEPH_CAP_GCACHE|CEPH_CAP_GBUFFER,0,0,0 },
The rd is REQ and it needs to do the lock state change from auth mds.0:
2022-04-07T06:34:55.052+0000 7fe6f18dc700 7 mds.1.locker rdlock_start on (ifile lock) on [inode 0x1000000615a [2,head] /client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/fsstress/fsstress rep@0.1 v18 s=0 n(v0 1=1+0) (iauth lock) (ifile lock) (ixattr lock) (iversion lock) cr={14520=0-4194304@1} | request=1 lock=1 0x55c75521cc00]
2022-04-07T06:34:55.052+0000 7fe6f18dc700 10 mds.1.locker requesting rdlock from auth on (ifile lock) on [inode 0x1000000615a [2,head] /client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/fsstress/fsstress rep@0.1 v18 s=0 n(v0 1=1+0) (iauth lock) (ifile lock) (ixattr lock) (iversion lock) cr={14520=0-4194304@1} | request=1 lock=1 0x55c75521cc00]
2022-04-07T06:34:55.052+0000 7fe6f18dc700 1 -- [v2:172.21.15.176:6824/2074966087,v1:172.21.15.176:6825/2074966087] send_to--> mds [v2:172.21.15.116:6826/4138533272,v1:172.21.15.116:6827/4138533272] -- lock(a=reqrdlock ifile 0x1000000615a.head) v1 -- ?+0 0x55c751f85340
2022-04-07T06:34:55.052+0000 7fe6f18dc700 1 -- [v2:172.21.15.176:6824/2074966087,v1:172.21.15.176:6825/2074966087] --> [v2:172.21.15.116:6826/4138533272,v1:172.21.15.116:6827/4138533272] -- lock(a=reqrdlock ifile 0x1000000615a.head) v1 -- 0x55c751f85340 con 0x55c750115000
2022-04-07T06:34:55.052+0000 7fe6f18dc700 7 mds.1.locker rdlock_start waiting on (ifile lock) on [inode 0x1000000615a [2,head] /client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/fsstress/fsstress rep@0.1 v18 s=0 n(v0 1=1+0) (iauth lock) (ifile lock) (ixattr lock) (iversion lock) cr={14520=0-4194304@1} | request=1 lock=1 0x55c75521cc00]
2022-04-07T06:34:55.052+0000 7fe6f18dc700 10 mds.1.cache.ino(0x1000000615a) add_waiter tag 2000000040000000 0x55c7540bd680 !ambig 1 !frozen 1 !freezing 1
2022-04-07T06:34:55.052+0000 7fe6f18dc700 15 mds.1.cache.ino(0x1000000615a) taking waiter here
And in mds.0, since the mds.0 is the auth MDS and the client.14520 is the loner client, so the client was issued the pAsxLsXsxFsxcrwb caps:
2022-04-07T06:34:55.054+0000 7f73ae424700 1 -- [v2:172.21.15.116:6826/4138533272,v1:172.21.15.116:6827/4138533272] <== mds.1 v2:172.21.15.176:6824/2074966087 17387 ==== lock(a=reqrdlock ifile 0x1000000615a.head) v1 ==== 69+0+0 (unknown 0 0 0) 0x55b263062fc0 con 0x55b2619b7000
2022-04-07T06:34:55.054+0000 7f73ae424700 7 mds.0.locker handle_file_lock a=reqrdlock on (ifile excl) from mds.1 [inode 0x1000000615a [2,head] /client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/fsstress/fsstress auth{1=1} v18 ap=1 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14520=0-4194304@1} caps={14520=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@2},l=14520 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x55b2658e6680]
And when trying to do the filelock state change (ifile excl->sync), it will need to do revoke Fxwb caps from client.14520:
2022-04-07T06:34:55.054+0000 7f73ae424700 7 mds.0.locker simple_sync on (ifile excl) on [inode 0x1000000615a [2,head] /client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/fsstress/fsstress auth{1=1} v18 ap=1 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14520=0-4194304@1} caps={14520=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@2},l=14520 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x55b2658e6680]
2022-04-07T06:34:55.054+0000 7f73ae424700 7 mds.0.locker issue_caps loner client.14520 allowed=pAsxLsXsxFscr, xlocker allowed=pAsxLsXsxFscr, others allowed=pLs on [inode 0x1000000615a [2,head] /client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/fsstress/fsstress auth{1=1} v18 ap=1 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={14520=0-4194304@1} caps={14520=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@2},l=14520 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x55b2658e6680]
2022-04-07T06:34:55.054+0000 7f73ae424700 10 mds.0.5 send_message_client_counted client.14520 seq 25328 client_caps(revoke ino 0x1000000615a 1 seq 3 caps=pAsxLsXsxFscr dirty=- wanted=pAsxXsxFsxcrwb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2022-04-07T06:34:53.392872+0000) v11
2022-04-07T06:34:55.054+0000 7f73ae424700 1 -- [v2:172.21.15.116:6826/4138533272,v1:172.21.15.116:6827/4138533272] --> v1:192.168.0.1:0/3995310513 -- client_caps(revoke ino 0x1000000615a 1 seq 3 caps=pAsxLsXsxFscr dirty=- wanted=pAsxXsxFsxcrwb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2022-04-07T06:34:53.392872+0000) v11 -- 0x55b26607c900 con 0x55b2619b9400
2022-04-07T06:34:55.054+0000 7f73ae424700 10 mds.0.cache.ino(0x1000000615a) auth_pin by 0x55b2658e6990 on [inode 0x1000000615a [2,head] /client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/fsstress/fsstress auth{1=1} v18 ap=2 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={14520=0-4194304@1} caps={14520=pAsxLsXsxFscr/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@3},l=14520 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x55b2658e6680] now 2
After 35 seconds later the cap update message came, but only the Fx caps is released:
2022-04-07T06:35:30.002+0000 7f73ae424700 1 -- [v2:172.21.15.116:6826/4138533272,v1:172.21.15.116:6827/4138533272] <== client.14520 v1:192.168.0.1:0/3995310513 24236 ==== client_caps(update ino 0x1000000615a 1 seq 3 tid 11103 caps=pAsxLsXsxFscrwb dirty=Fw wanted=pAsxXsxFsxcrwb follows 1 size 51656/0 mtime 2022-04-07T06:34:53.430871+0000 xattrs(v=18446744071603341568 l=0)) v10 ==== 236+0+0 (unknown 1066047011 0 0) 0x55b2643dac00 con 0x55b2619b9400
2022-04-07T06:35:30.002+0000 7f73ae424700 7 mds.0.locker handle_client_caps on 0x1000000615a tid 11103 follows 1 op update flags 0x0
That means the kclient was still holding the Fwb caps until around 3 hours later:
2022-04-07T09:34:23.874+0000 7f73ae424700 1 -- [v2:172.21.15.116:6826/4138533272,v1:172.21.15.116:6827/4138533272] <== client.14520 v1:192.168.0.1:0/3995310513 40893 ==== client_caps(update ino 0x1000000615a 1 seq 3 caps=pAsLsXsFsc dirty=- wanted=- follows 0 size 51656/0 mtime 2022-04-07T06:34:53.430871+0000 xattrs(v=18446744071603341568 l=0)) v10 ==== 236+0+0 (unknown 2446509421 0 0) 0x55b26413b800 con 0x55b2619b9400
The kernel code seems buggy, when receiving the Fxwb revoke request it will queue a CEPH_I_WORK_WRITEBACK work for the inode, and then try to write back the dirty pages and clear the queue bit of CEPH_I_WORK_WRITEBACK.
But if after dirty pages were wrote back, if the ci->i_wrbuffer_ref is not 0 it won't release the Fwb caps. And after that the kernel won't trigger to do the writeback. That means the mds.0 may wait the kclient to release the Fwb caps for a very long time until someone triggers to do the dirty pages writeback.