Bug #65630
openmds: rename request was deadlocked between two different MDSs
0%
Description
This is reported by Nigel, more detail please see https://www.mail-archive.com/ceph-users@ceph.io/msg24587.html
In mds.1 there was a rename slow request:
12034252 2024-04-16T04:18:12.195+0000 7fec2202f700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 34.998834 secs 12034253 2024-04-16T04:18:12.195+0000 7fec2202f700 0 log_channel(cluster) log [WRN] : slow request 34.998833 seconds old, received at 2024-04-16T04:17:37.200989+0000: client_request(client.380112553:122090977 rename #0x20008d973a4/2024-04-09-05:59:47.ar #0x20008d6a5a9/2024-04-09-05:59:47.ar 2024-04-16T04:17:37.195206+0000 caller_uid=1000, caller_gid=1000{1000,}) currently waiting for remote wrlocks
This rename client request just sent a peer request to mds.0 to get wrlock of inest for inode 0x20008d6a5a9:
12032731 2024-04-16T04:17:37.198+0000 7fec23832700 4 mds.1.server handle_client_request client_request(client.380112553:122090977 rename #0x20008d973a4/2024-04-09-05:59:47.ar #0x20008d6a5a9/2024-04-09-05:59:47.ar 2024-04-16T04:17:37.195206+0000 caller_uid=1000, caller_gid=1000{1000,}) v6 12032732 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.906085 get_session have 0x5643c8736f00 client.380112553 172.27.51.133:0/2603325972 state open 12032733 2024-04-16T04:17:37.198+0000 7fec23832700 15 mds.1.server oldest_client_tid=122090977 12032734 2024-04-16T04:17:37.198+0000 7fec23832700 7 mds.1.cache request_start request(client.380112553:122090977 nref=2 cr=0x56450f4de000) ... 12032924 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.locker client.380112553 pending pAsLsXs allowed pAsLsXs wanted pAsLsXsFsxcral 12032925 2024-04-16T04:17:37.198+0000 7fec23832700 4 mds.1.server handle_peer_request client.380112553:122090977 from mds.0 12032926 2024-04-16T04:17:37.198+0000 7fec23832700 7 mds.1.cache request_get client.380112553:122090977 request(client.380112553:122090977 nref=1 cr=0x56450f4de000) 12032927 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.server got remote wrlock on (ifile mix) on [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v976696 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:29.006958+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (isnap sync r=1) (inest mix->lock w=21 dirty) (ifile mix) (iversion lock) caps= {380112553=pAsLsXs/pAsLsXsFsxcral@33096} | dirtyscattered=1 request=1 lock=2 importing=0 dirfrag=8 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 12032928 2024-04-16T04:17:37.198+0000 7fec23832700 7 mds.1.server dispatch_client_request client_request(client.380112553:122090977 rename #0x20008d973a4/2024-04-09-05:59:47.ar #0x20008d6a5a9/2024-04-09-05:59:47.ar 2024-04-16T04:17:37.195206+0000 caller_uid=1000, caller_gid=1000{1000,}) v6 12032929 2024-04-16T04:17:37.198+0000 7fec23832700 7 mds.1.server handle_client_rename client_request(client.380112553:122090977 rename #0x20008d973a4/2024-04-09-05:59:47.ar #0x20008d6a5a9/2024-04-09-05:59:47.ar 2024-04-16T04:17:37.195206+0000 caller_uid=1000, caller_gid=1000{1000,}) v6 12032930 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.server rdlock_two_paths_xlock_destdn request(client.380112553:122090977 nref=2 cr=0x56450f4de000) #0x20008d973a4/2024-04-09-05:59:47.ar #0x20008d6a5a9/2024-04-09-05:59:47.ar 12032931 2024-04-16T04:17:37.198+0000 7fec23832700 7 mds.1.cache traverse: opening base ino 0x20008d973a4 snap head 12032932 2024-04-16T04:17:37.198+0000 7fec23832700 12 mds.1.cache traverse: path seg depth 0 '2024-04-09-05:59:47.ar' snapid head 12032933 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.cache.dir(0x20008d973a4) lookup (2024-04-09-05:59:47.ar, 'head') 12032934 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.cache.dir(0x20008d973a4) hit -> (2024-04-09-05:59:47.ar,head) 12032935 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.cache traverse: null+readable dentry at [dentry #0x1/rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ar/2024-04-09-05:59:47.ar [2,head] auth NULL (dversion lock) pv=0 v=740487 ap=1 ino=(nil) state=1073741824 | authpin=1 0x56446737f900] 12032936 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.cache path_traverse finish on snapid head 12032937 2024-04-16T04:17:37.198+0000 7fec23832700 7 mds.1.cache traverse: opening base ino 0x20008d6a5a9 snap head 12032938 2024-04-16T04:17:37.198+0000 7fec23832700 12 mds.1.cache traverse: path seg depth 0 '2024-04-09-05:59:47.ar' snapid head 12032939 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.cache.dir(0x20008d6a5a9.001*) lookup (2024-04-09-05:59:47.ar, 'head') 12032940 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.cache.dir(0x20008d6a5a9.001*) hit -> (2024-04-09-05:59:47.ar,head) 12032941 2024-04-16T04:17:37.198+0000 7fec23832700 7 mds.1.cache found target [inode 0x20008d76e3d [2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/2024-04-09-05:59:47.ar rep@0.1 v740487 s=116644185 n(v0 rc2024-04-10T13:56:40.732383+0000 b116644185 1=1+0) (iversion lock) | request=1 0x5643d471ab00] 12032942 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.cache path_traverse finish on snapid head 12032943 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker acquire_locks request(client.380112553:122090977 nref=3 cr=0x56450f4de000) 12032944 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.locker must remote_wrlock on mds.0 (ifile mix) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v976696 f(v19171 m2024-04-16T04:17:3 7.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:29.006958+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (isnap sync r=1) (inest mix->lock w=21 dirty) (ifile mix) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@33096} | dirtyscattered=1 request=1 lock=2 importing=0 dirfrag=8 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 12032945 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.locker must remote_wrlock on mds.0 (inest mix->lock w=21 dirty) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v976696 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:29.006958+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (isnap sync r=1) (inest mix->lock w=21 dirty) (ifile mix) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@33096} | dirtyscattered=1 request=1 lock=2 importing=0 dirfrag=8 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 12032946 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.locker must xlock (dn sync) [dentry #0x1/rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/2024-04-09-05:59:47.ar [2,head] rep@0.1 (dversion lock) pv=0 v=778809 ino=0x20008d76e3d state=0 | request=1 inodepin=1 0x56446737f680] 12032947 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.locker must wrlock (ifile excl) [inode 0x20008d973a4 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ar/ auth{0=1} v794227 ap=1 DIRTYPARENT f(v0 m2024-04-16T04:17:37.187206+000042=42+0) n(v0 rc2024-04-16T04:17:37.187206+0000 b4935834161 43=42+1) (isnap sync r=1) (inest mix) (ifile excl) (iversion lock) caps={380112553=pAsLsXsFsx/pAsLsXsFsxcral@50},l=380112553 | request=0 lock=1 dirfrag=1 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x56450ca09b80] 12032948 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.locker must wrlock (inest mix) [inode 0x20008d973a4 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ar/ auth{0=1} v794227 ap=1 DIRTYPARENT f(v0 m2024-04-16T04:17:37.187206+000042=42+0) n(v0 rc2024-04-16T04:17:37.187206+0000 b4935834161 43=42+1) (isnap sync r=1) (inest mix) (ifile excl) (iversion lock) caps={380112553=pAsLsXsFsx/pAsLsXsFsxcral@50},l=380112553 | request=0 lock=1 dirfrag=1 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x56450ca09b80] 12032949 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.locker must xlock (dn sync) [dentry #0x1/rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ar/2024-04-09-05:59:47.ar [2,head] auth NULL (dversion lock) pv=0 v=740487 ap=1 ino=(nil) state=1073741824 | authpin=1 0x56446737f900] 12032950 2024-04-16T04:17:37.198+0000 7fec23832700 20 mds.1.locker must wrlock (dversion lock) [dentry #0x1/rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ar/2024-04-09-05:59:47.ar [2,head] auth NULL (dversion lock) pv=0 v=740487 ap=1 ino=(nil) state=1073741824 | authpin=1 0x56446737f900] 12032951 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker must authpin [inode 0x20008d76e3d [2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/2024-04-09-05:59:47.ar rep@0.1 v740487 s=116644185 n(v0 rc2024-04-10T13:56:40.732383+0000 b116644185 1=1+0) (iversion lock) | request=1 0x5643d471ab00] 12032952 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker must authpin [dentry #0x1/rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/2024-04-09-05:59:47.ar [2,head] rep@0.1 (dversion lock) pv=0 v=778809 ino=0x20008d76e3d state=0 | request=1 inodepin=1 0x56446737f680] 12032953 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker must authpin [dentry #0x1/rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ar/2024-04-09-05:59:47.ar [2,head] auth NULL (dversion lock) pv=0 v=740487 ap=1 ino=(nil) state=1073741824 | authpin=1 0x56446737f900] 12032954 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker must authpin [inode 0x20008d973a4 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ar/ auth{0=1} v794227 ap=1 DIRTYPARENT f(v0 m2024-04-16T04:17:37.187206+0000 42=42+0) n(v0 rc2024-04-16T04:17:37.187206+0000 b4935834161 43=42+1) (isnap sync r=1) (inest mix) (ifile excl) (iversion lock) caps={380112553=pAsLsXsFsx/pAsLsXsFsxcral@50},l=380112553 | request=0 lock=1 dirfrag=1 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x56450ca09b80] 12032955 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker must authpin [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v976696 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:29.006958+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (isnap sync r=1) (inest mix->lock w=21 dirty) (ifile mix) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@33096} | dirtyscattered=1 request=1 lock=2 importing=0 dirfrag=8 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 12032956 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker already auth_pinned [inode 0x20008d76e3d [2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/2024-04-09-05:59:47.ar rep@0.1 v740487 s=116644185 n(v0 rc2024-04-10T13:56:40.732383+0000 b116644185 1=1+0) (iversion lock) | request=1 0x5643d471ab00] 12032957 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker already auth_pinned [dentry #0x1/rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/2024-04-09-05:59:47.ar [2,head] rep@0.1 (dversion lock) pv=0 v=778809 ino=0x20008d76e3d state=0 | request=1 inodepin=1 0x56446737f680] 12032958 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker already auth_pinned [dentry #0x1/rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ar/2024-04-09-05:59:47.ar [2,head] auth NULL (dversion lock) pv=0 v=740487 ap=1 ino=(nil) state=1073741824 | authpin=1 0x56446737f900] 12032959 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker already auth_pinned [inode 0x20008d973a4 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ar/ auth{0=1} v794227 ap=1 DIRTYPARENT f(v0 m2024-04-16T04:17:37.187206+0000 42=42+0) n(v0 rc2024-04-16T04:17:37.187206+0000 b4935834161 43=42+1) (isnap sync r=1) (inest mix) (ifile excl) (iversion lock) caps={380112553=pAsLsXsFsx/pAsLsXsFsxcral@50},l=380112553 | request=0 lock=1 dirfrag=1 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x56450ca09b80] 12032960 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker already auth_pinned [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v976696 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:29.006958+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (isnap sync r=1) (inest mix->lock w=21 dirty) (ifile mix) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@33096} | dirtyscattered=1 request=1 lock=2 importing=0 dirfrag=8 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 12032961 2024-04-16T04:17:37.198+0000 7fec23832700 10 mds.1.locker already remote_wrlocked (ifile mix) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v976696 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:29.006958+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (isnap sync r=1) (inest mix->lock w=21 dirty) (ifile mix) (iversion lock) caps ={380112553=pAsLsXs/pAsLsXsFsxcral@33096} | dirtyscattered=1 request=1 lock=2 importing=0 dirfrag=8 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 12032962 2024-04-16T04:17:37.198+0000 7fec23832700 7 mds.1.locker remote_wrlock_start mds.0 on (inest mix->lock w=21 dirty) on [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v976696 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:29.006958+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (isnap sync r=1) (inest mix->lock w=21 dirty) (ifile mix) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@33096} | dirtyscattered=1 request=1 lock=2 importing=0 dirfrag=8 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 12032963 2024-04-16T04:17:37.634+0000 7fec1b822700 20 update_rank0: mds.metrics 12032964 2024-04-16T04:17:37.634+0000 7fec1b822700 20 update_rank0: mds.metrics: sending metric updates for 4 clients to rank 0 (address: [v2:172.27.51.100:6800/3771994695,v1:172.27.51.100:6801/3771994695]) with sequence number 139671, last updated sequence number 349169
While in mds.0 the peer request was stuck and waiting the inode 0x20008d6a5a9 to release the wrlock by other requests:
19955486 2024-04-16T04:17:37.200+0000 7f6deb7df700 4 mds.0.server handle_peer_request client.380112553:122090977 from mds.1 19955487 2024-04-16T04:17:37.200+0000 7f6deb7df700 7 mds.0.cache request_get client.380112553:122090977 request(client.380112553:122090977 nref=1 peer_to mds.1) 19955488 2024-04-16T04:17:37.200+0000 7f6deb7df700 7 mds.0.server dispatch_peer_request request(client.380112553:122090977 nref=2 peer_to mds.1 sr=0x5614024f0000) peer_request(client.380112553:122090977.0 wrlock) v1 19955489 2024-04-16T04:17:37.200+0000 7f6deb7df700 10 mds.0.locker acquire_locks request(client.380112553:122090977 nref=2 peer_to mds.1 sr=0x5614024f0000) 19955490 2024-04-16T04:17:37.200+0000 7f6deb7df700 20 mds.0.locker must wrlock (ifile mix w=1) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v976696 ap=2 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:37.187206+0000 b3041518500280 20424=20418+6) (inest mix->lock(2) g=1 w=4704 dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@32779} mcw={1=AsLsXsFs} | dirtyscattered=1request=0 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000] 19955491 2024-04-16T04:17:37.200+0000 7f6deb7df700 20 mds.0.locker must wrlock (inest mix->lock(2) g=1 w=4704 dirty) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v976696 ap=2 f(v19171 m2024-04-17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:37.187206+0000 b3041518500280 20424=20418+6) (inest mix->lock(2) g=1 w=4704 dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@32779} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000] 19955492 2024-04-16T04:17:37.200+0000 7f6deb7df700 10 mds.0.locker must authpin [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v976696 ap=2 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:37.187206+0000 b3041518500280 20424=20418+6) (inest mix->lock(2) g=1 w=4704 dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@32779} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000] 19955493 2024-04-16T04:17:37.200+0000 7f6deb7df700 10 mds.0.locker already auth_pinned [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v976696 ap=2 f(v19171 m2024-04-16T04:17:37.187206+00003981=3977+4) n(v2752 rc2024-04-16T04:17:37.187206+0000 b3041518500280 20424=20418+6) (inest mix->lock(2) g=1 w=4704 dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@32779} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000] 19955494 2024-04-16T04:17:37.200+0000 7f6deb7df700 10 mds.0.locker already wrlocked (ifile mix w=1) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v976696 ap=2 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:37.187206+0000 b3041518500280 20424=20418+6) (inest mix->lock(2) g=1 w=4704 dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@32779} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000] 19955495 2024-04-16T04:17:37.200+0000 7f6deb7df700 10 mds.0.locker wrlock_start (inest mix->lock(2) g=1 w=4704 dirty) on [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v976696 ap=2 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:37.187206+0000 b3041518500280 20424=20418+6) (inest mix->lock(2) g=1 w=4704 dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@32779} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000] 19955496 2024-04-16T04:17:37.200+0000 7f6deb7df700 7 mds.0.locker wrlock_start waiting on (inest mix->lock(2) g=1 w=4704 dirty) on [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v976696 ap=2 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:37.187206+0000 b3041518500280 20424=20418+6) (inest mix->lock(2) g=1 w=4704 dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@32779} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000] 19955497 2024-04-16T04:17:37.200+0000 7f6deb7df700 10 mds.0.cache.ino(0x20008d6a5a9) add_waiter tag 2000400000000000 0x561385419980 !ambig 1 !frozen 1 !freezing 1 19955498 2024-04-16T04:17:37.200+0000 7f6deb7df700 15 mds.0.cache.ino(0x20008d6a5a9) taking waiter here 19955499 2024-04-16T04:17:37.200+0000 7f6deb7df700 10 mds.0.locker nudge_log (inest mix->lock(2) g=1 w=4704 dirty) on [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v976696 ap=2 f(v19171 m2024-04-16T04:17:37.187206+0000 3981=3977+4) n(v2752 rc2024-04-16T04:17:37.187206+0000 b3041518500280 20424=20418+6) (inest mix->lock(2) g=1 w=4704 dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@32779} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=1 authpin=1 discoverbase=0 0x5613daffc000] 19955500 2024-04-16T04:17:37.248+0000 7f6dee7e5700 20 mds.0.906081 get_session have 0x561259e30800 client.373790596 172.27.51.135:0/331975208 state open 19955501 2024-04-16T04:17:37.248+0000 7f6dee7e5700 20 handle_client_metrics: mds.metrics: session=0x561259e30800 19955502 2024-04-16T04:17:37.248+0000 7f6dee7e5700 20 handle_payload: mds.metrics: type=CAP_INFO, session=0x561259e30800, hits=9752063234, misses=861451
And in mds.0 logs we can see that the wrlock reference is very big, which was g=1 w=4704 dirty). This is abnormal.
Just went through the mds.0 and mds.1 logs, found that the wrlock reference was not correctly released, such as for client.380112553:122063467 rename request, just before it the reference was (inest mix w=1255 dirty):
13105523 2024-04-16T04:16:11.813+0000 7f6deb7df700 4 mds.0.server handle_client_request client_request(client.380112553:122063467 rename #0x1000949728a/2024-04-09-07:40:16.ar.done #0x20008d6a5a9/2024-04-09-07:40:16.ar.done 2024-04-16T04:16:11.812624+0000 caller_uid=1000, caller_gid=1000{1000,}) v6 13105524 2024-04-16T04:16:11.813+0000 7f6deb7df700 20 mds.0.906081 get_session have 0x561259e30300 client.380112553 172.27.51.133:0/2603325972 state open 13105525 2024-04-16T04:16:11.813+0000 7f6deb7df700 15 mds.0.server oldest_client_tid=122063467 13105526 2024-04-16T04:16:11.813+0000 7f6deb7df700 7 mds.0.cache request_start request(client.380112553:122063467 nref=2 cr=0x56138e5b0600) 13105527 2024-04-16T04:16:11.813+0000 7f6deb7df700 10 mds.0.locker process_request_cap_release client.380112553 pAsLsXs on [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v961733 f(v15698 m2024-04-16T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:11.808624+0000 b3043987993586 20393=20387+6) (inest mix w=1255 dirty) (iversion lock) caps={380112553=pAsLsXsFs/pAsLsXsFsxcral@21005} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=1 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=0 discoverbase=0 0x5613daffc000] 13105528 2024-04-16T04:16:11.813+0000 7f6deb7df700 10 mds.0.locker wanted pAsLsXsFsxcral -> pAsLsXsFsxcral 13105529 2024-04-16T04:16:11.813+0000 7f6deb7df700 10 mds.0.locker eval 3648 [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v961733 f(v15698 m2024-04-16T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:11.808624+0000 b3043987993586 20393=20387+6) (inest mix w=1255 dirty) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@21005} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=1 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=0 discoverbase=0 0x5613daffc000] ...
But after it finished the reference because (inest mix w=1256 dirty):
13106075 2024-04-16T04:16:11.821+0000 7f6de57d3700 7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.380112553:122063467 rename #0x1000949728a/2024-04-09-07:40:16.ar.done #0x20008d6a5a9/2024-04-09-07:40:16.ar.done 2024-04-16T04:1 6:11.812624+0000 caller_uid=1000, caller_gid=1000{1000,}) v6 13106076 2024-04-16T04:16:11.821+0000 7f6de57d3700 10 mds.0.server apply_allocated_inos 0x0 / [] / 0x0 13106077 2024-04-16T04:16:11.821+0000 7f6de57d3700 20 mds.0.server lat 0.006232 ... 13106116 2024-04-16T04:16:11.821+0000 7f6de57d3700 20 mds.0.server set_trace_dist added dir [dir 0x1000949728a /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/raw/ [2,head] auth{1=1} v=669641 cv=0/0 ap=1+1 state=1610874881|complete f(v0 m2024-04-16T04:16:11.812624+0000 4640=4640+0) n(v0 rc2024-04-16T04:16:11.812624+0000 b2571600461824 4640=4640+0) hs=4640+0,ss=0+0 dirty=4640 | child=1 replicated=1 dirty=1 authpin=1 0x5612ac030d00] 13106117 2024-04-16T04:16:11.821+0000 7f6de57d3700 20 mds.0.locker issue_client_lease no/null lease on [dentry #0x1/rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/raw/2024-04-09-07:40:16.ar.done [2,head] auth{1=1} (dversion lock) v=669640 ap=1 ino=0x10009481397 state=1610612736 | request=1 lock=0 inodepin=1 replicated=1 dirty=1 authpin=1 0x5613e0abd400] 13106118 2024-04-16T04:16:11.821+0000 7f6de57d3700 20 mds.0.cache.ino(0x10009481397) pfile 0 pauth 0 plink 0 pxattr 0 plocal 0 mtime 2024-04-09T20:44:23.934114+0000 ctime 2024-04-16T04:16:11.812624+0000 change_attr 2 valid=1 13106119 2024-04-16T04:16:11.821+0000 7f6de57d3700 10 mds.0.cache.ino(0x10009481397) encode_inodestat issuing pAsLsXsFscr seq 2 13106120 2024-04-16T04:16:11.821+0000 7f6de57d3700 10 mds.0.cache.ino(0x10009481397) encode_inodestat caps pAsLsXsFscr seq 2 mseq 1 xattrv 0 13106121 2024-04-16T04:16:11.821+0000 7f6de57d3700 20 mds.0.server set_trace_dist added snap head in [inode 0x10009481397 [2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/raw/2024-04-09-07:40:16.ar.done auth{1=1} v669640 DIRTYPARENT s=0 n(v0 rc2024-04-16T04:16:11.812624+0000 1=1+0) (iversion lock) caps={380112553=pAsLsXsFscr/-@2} | request=1 lock=0 importingcaps=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=0 0x56137a7b0b00] 13106122 2024-04-16T04:16:11.821+0000 7f6de57d3700 10 mds.0.906081 send_message_client client.380112553 172.27.51.133:0/2603325972 client_reply(???:122063467 = 0 (0) Success safe) v1 13106123 2024-04-16T04:16:11.821+0000 7f6de57d3700 7 mds.0.cache request_finish request(client.380112553:122063467 nref=2 cr=0x56138e5b0600) 13106124 2024-04-16T04:16:11.821+0000 7f6de57d3700 15 mds.0.cache request_cleanup request(client.380112553:122063467 nref=2 cr=0x56138e5b0600) ... 13106199 2024-04-16T04:16:11.821+0000 7f6deb7df700 10 mds.0.locker nudge_log (ifile mix->sync g=1) on [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v961733 ap=2 f(v15698 m2024-04-16T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:11.808624+0000 b3043987993586 20393=20387+6) (isnap sync r=1) (inest mix w=1256 dirty) (ifile mix->sync g=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@21005} mcw={1=AsLsXsFs} | dirtyscattered=1 request=1 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=1 authpin=1 discoverbase=0 0x5613daffc000] 13106200 2024-04-16T04:16:11.821+0000 7f6deb7df700 4 mds.0.server handle_peer_request client.380112553:122063467 from mds.1 13106201 2024-04-16T04:16:11.821+0000 7f6deb7df700 10 mds.0.cache committed_leader_peer mds.1 on client.380112553:122063467 13106202 2024-04-16T04:16:11.821+0000 7f6deb7df700 10 mds.0.cache log_leader_commit client.380112553:122063467 13106203 2024-04-16T04:16:11.821+0000 7f6de47d1700 5 mds.0.log _submit_thread 28695230979639~45 : ECommitted client.380112553:122063467
It was because the inest lock was force acquired in the _rename_prepare()->predirty_journal_parents() locally, but forgot directly because it was found a remote_wrlock:
13106011 2024-04-16T04:16:11.817+0000 7f6deb7df700 7 mds.0.locker wrlock_force on (inest mix w=1255 dirty) on [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v961733 ap=1 f(v15698 m2024-04-16T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:11.808624+0000 b3043987993586 20393=20387+6) (isnap sync r=1) (inest mix w=1255 dirty) (ifile mix) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@21005} mcw={1=AsLsXsFs}| dirtyscattered=1 request=1 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000] ... 13107053 2024-04-16T04:16:11.829+0000 7f6de57d3700 10 mds.0.cache request_drop_foreign_locks forgetting remote_wrlock (inest mix w=1257 dirty) on mds.1 on [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v961735 ap=1 f(v15699 m2024-04-16T04:16:11.812624+0000 11439=11436+3) n(v2741 rc2024-04-16T04:16:11.808624+0000 b3043987993586 20393=20387+6) (isnap sync r=1) (inest mix w=1257 dirty) (ifile mix) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@21006} mcw={1=AsLsXsFs} | dirtyscattered=1 request=1 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000]
In mds.1 it was acquired the inest wrlock as remote from mds.0:
8133769 2024-04-16T04:16:11.811+0000 7fec23832700 4 mds.1.server handle_peer_request client.380112553:122063467 from mds.0 8133770 2024-04-16T04:16:11.811+0000 7fec23832700 7 mds.1.cache request_get client.380112553:122063467 request(client.380112553:122063467 nref=1 peer_to mds.0) 8133771 2024-04-16T04:16:11.811+0000 7fec23832700 7 mds.1.server dispatch_peer_request request(client.380112553:122063467 nref=2 peer_to mds.0 sr=0x5645f96f5c00) peer_request(client.380112553:122063467.0 wrlock) v1 8133772 2024-04-16T04:16:11.811+0000 7fec23832700 10 mds.1.locker acquire_locks request(client.380112553:122063467 nref=2 peer_to mds.0 sr=0x5645f96f5c00) 8133773 2024-04-16T04:16:11.811+0000 7fec23832700 20 mds.1.locker must wrlock (ifile mix w=1) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v961733 f(v15698 m2024-04-16T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:07.192484+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (inest mix dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@14141} | dirtyscattered=1 request=0 lock=1 importing=0 dirfrag=4 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 8133774 2024-04-16T04:16:11.811+0000 7fec23832700 20 mds.1.locker must wrlock (inest mix dirty) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v961733 f(v15698 m2024-04-16T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:07.192484+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (inest mix dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@14141} | dirtyscattered=1 request=0 lock=1 importing=0 dirfrag=4 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 8133775 2024-04-16T04:16:11.811+0000 7fec23832700 10 mds.1.locker already wrlocked (ifile mix w=1) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v961733 f(v15698 m2024-04-16T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:07.192484+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (inest mix dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@14141} | dirtyscattered=1 request=0 lock=1 importing=0 dirfrag=4 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 8133776 2024-04-16T04:16:11.811+0000 7fec23832700 10 mds.1.locker wrlock_start (inest mix dirty) on [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v961733 f(v15698 m2024-04-16T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:07.192484+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (inest mix dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@14141} | dirtyscattered=1 request=0 lock=1 importing=0 dirfrag=4 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680] 8133777 2024-04-16T04:16:11.811+0000 7fec23832700 10 mds.1.locker got wrlock on (inest mix w=1 dirty) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ rep@0.1 fragtree_t(*^3) v961733 f(v15698 m2024-04-16T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:07.192484+0000 b3043987993586 20171=20165+6)/n(v2734 rc2024-04-11T22:38:32.637569+0000 b3043987993586 20170=20165+5) (inest mix w=1 dirty) (ifile mix w=1) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@14141} | dirtyscattered=1 request=0 lock=2 importing=0 dirfrag=4 caps=1 dirwaiter=0 waiter=0 discoverbase=0 0x56463bcc6680]
This was because mds.0 was the auth of inode 0x20008d6a5a9, but mds.1 was the auth of part of the dir frags:
13105312 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 000* fragstat f(v15698 m2024-04-16T04:16:11.692621+0000 1399=1399+0) 13105313 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 000* accounted_fragstat f(v15698 m2024-04-16T04:16:11.692621+0000 1399=1399+0) 13105314 2024-04-16T04:16:11.809+0000 7f6de57d3700 15 mds.0.cache.ino(0x20008d6a5a9) 001* [dir 0x20008d6a5a9.001* /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ [2,head] auth{1=1} v=781238 cv=778809/778809 REP state=1610874881|complete f(v15698 m2024-04-16T04:16:11.788623+0000 1478=1478+0) n(v2741 rc2024-04-16T04:16:11.788623+0000 b57567567833 1478=1478+0) hs=1478+583,ss=0+0 dirty=583 | ptrwaiter=0 request=0 child=1 frozen=0 subtree=0 importing=0 importbound=0 sticky=0 replicated=1 dirty=1 waiter=0 authpin=0 tempexporting=0 0x5613e0a19b00] 13105315 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 001* fragstat f(v15698 m2024-04-16T04:16:11.788623+0000 1478=1478+0) 13105316 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 001* accounted_fragstat f(v15698 m2024-04-16T04:16:11.788623+0000 1478=1478+0) 13105317 2024-04-16T04:16:11.809+0000 7f6de57d3700 15 mds.0.cache.ino(0x20008d6a5a9) 010* [dir 0x20008d6a5a9.010* /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ [2,head] auth{1=1} v=781394 cv=778929/778929 REP state=1610874881|complete f(v15698 m2024-04-16T04:16:11.624618+0000 1444=1444+0) n(v2741 rc2024-04-16T04:16:11.624618+0000 b56518820987 1444=1444+0) hs=1444+601,ss=0+0 dirty=601 | ptrwaiter=0 request=0 child=1 frozen=0 subtree=0 importing=0 importbound=0 sticky=0 replicated=1 dirty=1 waiter=0 authpin=0 tempexporting=0 0x561289be7a80] 13105318 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 010* fragstat f(v15698 m2024-04-16T04:16:11.624618+0000 1444=1444+0) 13105319 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 010* accounted_fragstat f(v15698 m2024-04-16T04:16:11.624618+0000 1444=1444+0) 13105320 2024-04-16T04:16:11.809+0000 7f6de57d3700 15 mds.0.cache.ino(0x20008d6a5a9) 011* [dir 0x20008d6a5a9.011* /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ [2,head] auth{1=1} v=781414 cv=778985/778985 REP ap=1+1 state=1610874881|complete f(v15698 m2024-04-16T04:16:11.808624+0000 1420=1420+0) n(v2741 rc2024-04-16T04:16:11.808624+0000 b63102385077 1420=1420+0) hs=1420+583,ss=0+0 dirty=583 | ptrwaiter=0 request=0 child=1 frozen=0 subtree=0 importing=0 importbound=0 sticky=0 replicated=1dirty=1 waiter=0 authpin=1 tempexporting=0 0x5612f85d7200] 13105321 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 011* fragstat f(v15698 m2024-04-16T04:16:11.808624+0000 1420=1420+0) 13105322 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 011* accounted_fragstat f(v15698 m2024-04-16T04:16:11.808624+0000 1420=1420+0) 13105323 2024-04-16T04:16:11.809+0000 7f6de57d3700 15 mds.0.cache.ino(0x20008d6a5a9) 100* [dir 0x20008d6a5a9.100* /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ [2,head] auth{1=1} v=790508 cv=778844/778844 REP ap=1+1 state=1610874881|complete f(v15698 m2024-04-16T04:16:11.800624+0000 1403=1402+1) n(v2741 rc2024-04-16T04:16:11.808624+0000 b2626597654305 6042=6041+1)/n(v2741 rc2024-04-16T04:16:11.800624+0000 b2626597654305 6040=6039+1) hs=1403+560,ss=0+0 dirty=561 | ptrwaiter=0 child=1 frozen=0 subtree=0 importing=0 importbound=0 sticky=0 replicated=1 dirty=1 authpin=1 0x5612bff4c400] 13105324 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 100* fragstat f(v15698 m2024-04-16T04:16:11.800624+0000 1403=1402+1) 13105325 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 100* accounted_fragstat f(v15698 m2024-04-16T04:16:11.800624+0000 1403=1402+1) 13105326 2024-04-16T04:16:11.809+0000 7f6de57d3700 15 mds.0.cache.ino(0x20008d6a5a9) 101* [dir 0x20008d6a5a9.101* /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ [2,head] rep@1.1 REP dir_auth=1 state=0 f(v15698 m2024-04-16T04:16:11.764623+0000 1386= 1385+1) n(v2740 rc2024-04-16T04:16:07.132483+0000 b58476501383 5501=5500+1)/n(v2740 rc2024-04-16T04:15:55.788140+0000 b66156550535 5520=5519+1) hs=0+5,ss=0+0 | ptrwaiter=0 request=0 dnwaiter=0 child=1 frozen=0 subtree=1 importing=0 importbound=0 sticky=0 replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x561275fde900] 13105327 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 101* fragstat f(v15698 m2024-04-16T04:16:11.764623+0000 1386=1385+1) 13105328 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 101* accounted_fragstat f(v15698 m2024-04-16T04:16:11.764623+0000 1386=1385+1) 13105329 2024-04-16T04:16:11.809+0000 7f6de57d3700 15 mds.0.cache.ino(0x20008d6a5a9) 110* [dir 0x20008d6a5a9.110* /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ [2,head] rep@1.1 REP dir_auth=1 state=0 f(v15698 m2024-04-16T04:16:11.720621+0000 1452=1452+0) n(v2740 rc2024-04-16T04:16:07.076481+0000 b57820821639 1527=1527+0)/n(v2740 rc2024-04-16T04:15:54.460099+0000 b63580858503 1545=1545+0) hs=0+7,ss=0+0 | ptrwaiter=0 request=0 dnwaiter=0 child=1 frozen=0 subtree=1 sticky=0 replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x56139ce2e480] 13105330 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 110* fragstat f(v15698 m2024-04-16T04:16:11.720621+0000 1452=1452+0) 13105331 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 110* accounted_fragstat f(v15698 m2024-04-16T04:16:11.720621+0000 1452=1452+0) 13105332 2024-04-16T04:16:11.809+0000 7f6de57d3700 15 mds.0.cache.ino(0x20008d6a5a9) 111* [dir 0x20008d6a5a9.111* /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ [2,head] rep@1.1 REP dir_auth=1 state=0 f(v15698 m2024-04-16T04:16:11.648619+0000 1458=1457+1) n(v2740 rc2024-04-16T04:16:07.180484+0000 b63730568091 1583=1580+3)/n(v2740 rc2024-04-16T04:15:54.580103+0000 b68850600859 1596=1593+3) hs=0+3,ss=0+0 | ptrwaiter=0 request=0 dnwaiter=0 child=1 frozen=0 subtree=1 importing=0 sticky=0 replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x561290ae2480] 13105333 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 111* fragstat f(v15698 m2024-04-16T04:16:11.648619+0000 1458=1457+1) 13105334 2024-04-16T04:16:11.809+0000 7f6de57d3700 20 mds.0.cache.ino(0x20008d6a5a9) 111* accounted_fragstat f(v15698 m2024-04-16T04:16:11.648619+0000 1458=1457+1)
And in mds.0 in rdlock_two_paths_xlock_destdn() it will try to acquire the wrlock for the srcdir:
13105654 2024-04-16T04:16:11.813+0000 7f6deb7df700 7 mds.0.server handle_client_rename client_request(client.380112553:122063467 rename #0x1000949728a/2024-04-09-07:40:16.ar.done #0x20008d6a5a9/2024-04-09-07:40:16.ar.done 2024-04-16T04:16:11.812624+0000 caller_uid=1000, caller_gid=1000{1000,}) v6 13105655 2024-04-16T04:16:11.813+0000 7f6deb7df700 10 mds.0.server rdlock_two_paths_xlock_destdn request(client.380112553:122063467 nref=2 cr=0x56138e5b0600) #0x1000949728a/2024-04-09-07:40:16.ar.done #0x20008d6a5a9/2024-04-09-07:40:16.ar.done 13105656 2024-04-16T04:16:11.813+0000 7f6deb7df700 7 mds.0.cache traverse: opening base ino 0x1000949728a snap head 13105657 2024-04-16T04:16:11.813+0000 7f6deb7df700 12 mds.0.cache traverse: path seg depth 0 '2024-04-09-07:40:16.ar.done' snapid head 13105658 2024-04-16T04:16:11.813+0000 7f6deb7df700 20 mds.0.cache.dir(0x1000949728a) lookup (2024-04-09-07:40:16.ar.done, 'head') 13105659 2024-04-16T04:16:11.813+0000 7f6deb7df700 20 mds.0.cache.dir(0x1000949728a) hit -> (2024-04-09-07:40:16.ar.done,head) 13105660 2024-04-16T04:16:11.813+0000 7f6deb7df700 10 mds.0.cache traverse: null+readable dentry at [dentry #0x1/rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/raw/2024-04-09-07:40:16.ar.done [2,head] auth NULL (dversion lock) pv=0 v=669639 ap=1 ino=(nil) state=1073741824 | authpin=1 0x5613e0abd400] 13105661 2024-04-16T04:16:11.813+0000 7f6deb7df700 10 mds.0.cache path_traverse finish on snapid head 13105662 2024-04-16T04:16:11.813+0000 7f6deb7df700 7 mds.0.cache traverse: opening base ino 0x20008d6a5a9 snap head 13105663 2024-04-16T04:16:11.813+0000 7f6deb7df700 12 mds.0.cache traverse: path seg depth 0 '2024-04-09-07:40:16.ar.done' snapid head 13105664 2024-04-16T04:16:11.813+0000 7f6deb7df700 20 mds.0.cache.dir(0x20008d6a5a9.111*) lookup (2024-04-09-07:40:16.ar.done, 'head') 13105665 2024-04-16T04:16:11.813+0000 7f6deb7df700 20 mds.0.cache.dir(0x20008d6a5a9.111*) hit -> (2024-04-09-07:40:16.ar.done,head) 13105666 2024-04-16T04:16:11.813+0000 7f6deb7df700 7 mds.0.cache found target [inode 0x10009481397 [2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/2024-04-09-07:40:16.ar.done rep@1.1 v135717 s=0 n(v0 rc2024-04-09T20:44:23.934114+0000 1=1+0) (iversion lock) | request=1 0x56137a7b0b00] 13105667 2024-04-16T04:16:11.813+0000 7f6deb7df700 10 mds.0.cache path_traverse finish on snapid head 13105668 2024-04-16T04:16:11.813+0000 7f6deb7df700 10 mds.0.locker acquire_locks request(client.380112553:122063467 nref=3 cr=0x56138e5b0600) 13105669 2024-04-16T04:16:11.813+0000 7f6deb7df700 20 mds.0.locker must remote_wrlock on mds.1 (ifile sync) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v961733 ap=1 f(v15698 m2024-04-16 T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:11.808624+0000 b3043987993586 20393=20387+6) (isnap sync r=1) (inest mix w=1255 dirty) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@21005} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000] 13105670 2024-04-16T04:16:11.813+0000 7f6deb7df700 20 mds.0.locker must remote_wrlock on mds.1 (inest mix w=1255 dirty) [inode 0x20008d6a5a9 [...2,head] /rd1/vela/PRIVATE/MTP26M/J0835-4510_S/2024_100/2250/ auth{1=1} fragtree_t(*^3) v961733 ap=1 f(v15698m2024-04-16T04:16:11.808624+0000 11440=11437+3) n(v2741 rc2024-04-16T04:16:11.808624+0000 b3043987993586 20393=20387+6) (isnap sync r=1) (inest mix w=1255 dirty) (iversion lock) caps={380112553=pAsLsXs/pAsLsXsFsxcral@21005} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=2 importingcaps=0 importing=0 dirfrag=8 caps=1 stickydirs=0 dirtyrstat=0 dirwaiter=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 0x5613daffc000]
The code:
... 3991 if (xlock_srcdn) { 3992 mds_rank_t srcdir_auth = srcdir->authority().first; 3993 if (srcdir_auth == mds->get_nodeid()) { 3994 lov.add_wrlock(&srcdir->inode->filelock); 3995 lov.add_wrlock(&srcdir->inode->nestlock); 3996 } else { 3997 lov.add_remote_wrlock(&srcdir->inode->filelock, srcdir_auth); 3998 lov.add_remote_wrlock(&srcdir->inode->nestlock, srcdir_auth); 3999 } 4000 lov.add_xlock(&srcdn->lock); 4001 } else { 4002 lov.add_rdlock(&srcdn->lock); 4003 } 4004 ...
Since the srcdir_auth was not the local mds, it would send a peer request to mds.1 and mark the inest as a remote wrlock in inode 0x20008d6a5a9.
It's buggy.