Bug #50223
openqa: "client.4737 isn't responding to mclientcaps(revoke)"
0%
Description
2021-04-07T06:14:51.658 INFO:teuthology.orchestra.run.smithi029.stdout:2021-04-07T05:38:28.731361+0000 mds.c (mds.1) 2 : cluster [WRN] client.4737 isn't responding to mclientcaps(revoke), ino 0x200000001fd pending pAsLsXsFscr issued pAsLsXsFsxcrwb, sent 303.593892 seconds ago
From: /ceph/teuthology-archive/pdonnell-2021-04-07_02:12:41-fs-wip-pdonnell-testing-20210406.213012-distro-basic-smithi/6025722/teuthology.log
With MDS thrashing and FUSE.
Updated by Patrick Donnelly about 3 years ago
osd thrashing and kclient: /ceph/teuthology-archive/pdonnell-2021-04-08_22:42:24-fs-wip-pdonnell-testing-20210408.192301-distro-basic-smithi/6030618/teuthology.log
Updated by Patrick Donnelly about 3 years ago
- Status changed from New to Triaged
- Assignee set to Patrick Donnelly
Probably need to lengthen some session timeouts so this goes away. The tests still succeed otherwis.
Updated by Patrick Donnelly over 2 years ago
/ceph/teuthology-archive/pdonnell-2021-09-14_01:17:08-fs-wip-pdonnell-testing-20210910.181451-distro-basic-smithi/6388586/teuthology.log
Updated by Kotresh Hiremath Ravishankar over 2 years ago
Updated by Rishabh Dave almost 2 years ago
Found same error message during weekly QA run for -
http://pulpito.front.sepia.ceph.com/rishabh-2022-07-08_23:53:34-fs-wip-rishabh-testing-2022Jul08-1820-testing-default-smithi/6921042
http://pulpito.front.sepia.ceph.com/rishabh-2022-07-08_23:53:34-fs-wip-rishabh-testing-2022Jul08-1820-testing-default-smithi/6921117
http://pulpito.front.sepia.ceph.com/rishabh-2022-07-08_23:53:34-fs-wip-rishabh-testing-2022Jul08-1820-testing-default-smithi/6921125
Updated by Xiubo Li 8 months ago
Rishabh Dave wrote:
The revocation request stuck for more than 300s, ino 0x20000001095 pending pFc issued pFcb:
2023-06-23T19:15:33.186+0000 7f972af56640 10 mds.1.log _trim_expired_segments waiting for 1/4194304 to expire 2023-06-23T19:15:33.186+0000 7f972af56640 10 mds.1.server find_idle_sessions. last cleared laggy state 2773.67s ago 2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.server laggiest active session is client.4966 v1:172.21.15.45:0/229335070 and renewed caps recently (63.4994s ago) 2023-06-23T19:15:33.186+0000 7f972af56640 10 mds.1.locker scatter_tick 2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.locker caps_tick 7 revoking caps 2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.locker caps_tick age = 300.003951 client.4966.0x20000001095 2023-06-23T19:15:33.186+0000 7f972af56640 0 log_channel(cluster) log [WRN] : client.4966 isn't responding to mclientcaps(revoke), ino 0x20000001095 pending pFc issued pFcb, sent 300.003951 seconds ago 2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.locker caps_tick client.4966 isn't responding to mclientcaps(revoke), ino 0x20000001095 pending pFc issued pFcb, sent 300.003951 seconds ago 2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.locker caps_tick age = 290.003490 client.4966.0x2000000109e 2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.locker caps_tick age below timeout 300.000000 2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.bal handle_export_pins export_pin_queue size=0 2023-06-23T19:15:33.186+0000 7f972af56640 15 mds.1.bal handle_export_pins number of auth trees = 47; not printing auth trees 2023-06-23T19:15:33.186+0000 7f972af56640 15 mds.1.bal tick tick last_sample now 2773.667067s ... 2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.locker caps_tick 12 revoking caps 2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.locker caps_tick age = 400.006155 client.4966.0x20000001095 2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.locker caps_tick silencing log message (backoff) for client.4966.0x20000001095 2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.locker caps_tick age = 85.001210 client.4966.0x20000000caf 2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.locker caps_tick age below timeout 300.000000 2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.bal handle_export_pins export_pin_queue size=0 2023-06-23T19:17:13.189+0000 7f972af56640 15 mds.1.bal handle_export_pins number of auth trees = 47; not printing auth trees 2023-06-23T19:17:13.189+0000 7f972af56640 15 mds.1.bal tick tick last_sample now 2873.670095s
And the above warning disappeared just after flushsnap request came:
2023-06-23T19:17:17.105+0000 7f972c759640 1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] <== client.4966 v1:172.21.15.45:0/229335070 39306 ==== client_caps(flushsnap ino 0x20000001095 0 seq 0 tid 75364 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 247 size 11344/0 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:16.584371+0000 change_attr 1) v12 ==== 268+0+0 (unknown 1851060495 0 0) 0x55dc739d3500 con 0x55dc71e04800 2023-06-23T19:17:17.105+0000 7f972c759640 7 mds.1.locker handle_client_caps on 0x20000001095 tid 75364 follows 247 op flushsnap flags 0x0 2023-06-23T19:17:17.105+0000 7f972c759640 20 mds.1.9 get_session have 0x55dc71d7ea00 client.4966 v1:172.21.15.45:0/229335070 state open 2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.locker head inode [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15635 ap=2 snaprealm=0x55dc72db3d40 DIRTYPARENT s=11344 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 b11344 1=1+0) need_snapflush={248=4966} (ifile excl->sync) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFc/pFcb/-@4},l=4966(-1) | ptrwaiter=0 request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] 2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.cache.snaprealm(0x20000001095 seq 584 0x55dc72db3d40) get_snaps 248 (seq 248 cached_seq 248) 2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.locker flushsnap follows 247 -> snap 248 2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.cache pick_inode_snap follows 247 on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15635 ap=2 snaprealm=0x55dc72db3d40 DIRTYPARENT s=11344 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 b11344 1=1+0) need_snapflush={248=4966} (ifile excl->sync) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFc/pFcb/-@4},l=4966(-1) | ptrwaiter=0 request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] 2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.cache pick_inode_snap found [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap auth v6871 ap=5 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4966=0-4194304@247} | request=0 lock=4 dirty=1 authpin=1 0x55dc7425f700] 2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.locker snapped inode [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap auth v6871 ap=5 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4966=0-4194304@247} | request=0 lock=4 dirty=1 authpin=1 0x55dc7425f700] 2023-06-23T19:17:17.105+0000 7f972c759640 7 mds.1.locker flushsnap snap 248 client.4966 on [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap auth v6871 ap=5 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4966=0-4194304@247} | request=0 lock=4 dirty=1 authpin=1 0x55dc7425f700] 2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.locker _do_snap_update dirty Fw follows 247 snap 248 on [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap auth v6871 ap=5 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4966=0-4194304@247} | request=0 lock=4 dirty=1 authpin=1 0x55dc7425f700] 2023-06-23T19:17:17.105+0000 7f972c759640 15 mds.1.cache.ino(0x20000001095) project_inode 0x20000001095 2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.cache.dir(0x10000001701.001*) pre_dirty 7102 2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.cache.den(0x10000001701.001* .organizationmap) pre_dirty [dentry #0x1/client.0/tmp/ceph/.organizationmap [248,248] auth (dversion lock) pv=7102 v=6899 ino=0x20000001095 state=1610612736 | request=0 inodepin=1 dirty=1 0x55dc788c1180] ...
Since nl=0 and then later finally the inode 0x20000001095 was removed:
2023-06-23T20:10:09.171+0000 7f972c759640 15 mds.1.cache trim_inode [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap rep@0.1 v7102 s=11344 n(v0 b11344 1=1+0) (iversion lock) 0x55dc7425f700] 2023-06-23T20:10:09.171+0000 7f972c759640 12 mds.1.cache sending expire to mds.0 on [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap rep@0.1 v7102 s=11344 n(v0 b11344 1=1+0) (iversion lock) 0x55dc7425f700] 2023-06-23T20:10:09.171+0000 7f972c759640 12 mds.1.cache.dir(0x10000001701.001*) unlink_inode [dentry #0x1/client.0/tmp/ceph/.organizationmap [248,248] rep@0.1 (dversion lock) v=7102 ino=0x20000001095 state=0 0x55dc788c1180] [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap rep@0.1 v7102 s=11344 n(v0 b11344 1=1+0) (iversion lock) 0x55dc7425f700] 2023-06-23T20:10:09.171+0000 7f972c759640 14 mds.1.cache remove_inode [inode 0x20000001095 [248,248] #20000001095 rep@-2.1 v7102 s=11344 n(v0 b11344 1=1+0) (iversion lock) 0x55dc7425f700]
Why the revocation of Fb stuck so long for inode 0x20000001095 ?
Client just sent a unlink request first and at the same time the unlink request released the ~pFxcwb caps:
2023-06-23T19:10:22.089+0000 7f972c759640 1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] <== client.4966 v1:172.21.15.45:0/229335070 37723 ==== client_request(client.4966:89809 unlink #0x10000001701/.organizationmap 2023-06-23T19:10:21.936301+0000 caller_uid=1000, caller_gid=1271{6,27,108,1271,10102,}) v6 ==== 308+0+0 (unknown 3314630587 0 0) 0x55dc76a23500 con 0x55dc71e04800 ... 2023-06-23T19:10:22.089+0000 7f972c759640 4 mds.1.server handle_client_request client_request(client.4966:89809 unlink #0x10000001701/.organizationmap 2023-06-23T19:10:21.936301+0000 caller_uid=1000, caller_gid=1271{6,27,108,1271,10102,}) v6 2023-06-23T19:10:22.089+0000 7f972c759640 20 mds.1.9 get_session have 0x55dc71d7ea00 client.4966 v1:172.21.15.45:0/229335070 state open 2023-06-23T19:10:22.089+0000 7f972c759640 15 mds.1.server oldest_client_tid=89803 2023-06-23T19:10:22.089+0000 7f972c759640 7 mds.1.cache request_start request(client.4966:89809 nref=2 cr=0x55dc76a23500) 2023-06-23T19:10:22.089+0000 7f972c759640 10 mds.1.locker process_request_cap_release client.4966 pFxcwb on [inode 0x20000001095 [248,head] /client.0/tmp/ceph/.organizationmap auth v6871 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=4966 | request=0 caps=1 dirtyparent=1 dirty=1 0x55dc740fe580] 2023-06-23T19:10:22.089+0000 7f972c759640 10 mds.1.locker wanted pAsxXsxFxwb -> Fxcb ...
I think why it still held the Fb caps was that there still had dirty data not flushed yet in client side and marked needsnapflush=1:
2023-06-23T19:10:22.093+0000 7f972c759640 10 mds.1.cache.ino(0x20000001095) add_need_snapflush client.4966 snapid 248 on 0x55dc7425f700 2023-06-23T19:10:22.093+0000 7f972c759640 10 mds.1.cache.ino(0x20000001095) auth_pin by 0x55dc740fe580 on [inode 0x20000001095 [249,head] ~mds1/stray6/20000001095 auth v6871 pv15589 ap=4 DIRTYPARENT s=0 n(v0 1=1+0)->n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) (ilink xlock x=1 by 0x55dc72913c00) (ifile excl) (iversion lock w=1 last_client=4966) cr={4966=0-4194304@247} caps={4966=pFxcwb/Fxcb@2},l=4966 | request=1 lock=3 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] now 4
And then early repled it:
2023-06-23T19:10:22.093+0000 7f972c759640 10 mds.1.locker set_xlocks_done on (dn xlock x=1 by 0x55dc72913c00) [dentry #0x101/stray6/20000001095 [249,head] auth NULL (dn xlock x=1 by 0x55dc72913c00) (dversion lock w=1 last_client=4966) pv=15589 v=15588 ap=2 ino=(nil) state=1342177281|new | request=1 lock=2 authpin=1 0x55dc788c0f00] 2023-06-23T19:10:22.093+0000 7f972c759640 10 mds.1.server early_reply 0 ((0) Success) client_request(client.4966:89809 unlink #0x10000001701/.organizationmap 2023-06-23T19:10:21.936301+0000 caller_uid=1000, caller_gid=1271{6,27,108,1271,10102,}) v6
And then safe replied it and at the same time set the client as loner by issuing pAsLsXsFsxcrwb caps:
2023-06-23T19:10:22.129+0000 7f972674d640 7 mds.1.locker get_allowed_caps loner client.4966 allowed=pAsLsXsFsxcrwb, xlocker allowed=pAsLsXsFsxcrwb, others allowed=pAsLsXs on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=2 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFxcwb/Fxcb@2},l=4966 | request=1 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] 2023-06-23T19:10:22.129+0000 7f972674d640 20 mds.1.locker client.4966 pending pFxcwb allowed pAsLsXsFsxcrwb wanted Fxcb 2023-06-23T19:10:22.129+0000 7f972674d640 7 mds.1.locker sending MClientCaps to client.4966 seq 3 new pending pAsLsXsFsxcrwb was pFxcwb 2023-06-23T19:10:22.129+0000 7f972674d640 20 mds.1.cache.ino(0x20000001095) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1 2023-06-23T19:10:22.129+0000 7f972674d640 10 mds.1.9 send_message_client_counted client.4966 seq 36035 client_caps(grant ino 0x20000001095 1 seq 3 caps=pAsLsXsFsxcrwb dirty=- wanted=LsFxcb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1) v12 2023-06-23T19:10:22.129+0000 7f972674d640 1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] --> v1:172.21.15.45:0/229335070 -- client_caps(grant ino 0x20000001095 1 seq 3 caps=pAsLsXsFsxcrwb dirty=- wanted=LsFxcb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1) v12 -- 0x55dc72410700 con 0x55dc71e04800 2023-06-23T19:10:22.129+0000 7f972674d640 10 mds.1.9 send_message_client client.4966 v1:172.21.15.45:0/229335070 client_reply(???:89809 = 0 (0) Success safe) v1 2023-06-23T19:10:22.129+0000 7f972674d640 1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] --> v1:172.21.15.45:0/229335070 -- client_reply(???:89809 = 0 (0) Success safe) v1 -- 0x55dc749a6540 con 0x55dc71e04800
And then client release all the caps except Fcb:
2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker eval done 2023-06-23T19:10:23.669+0000 7f972c759640 1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] <== client.4966 v1:172.21.15.45:0/229335070 37792 ==== client_caps(update ino 0x20000001095 1 seq 3 tid 74186 caps=pFcb dirty=Fw wanted=- follows 248 size 11344/0 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1 xattrs(v=4655440983702587648 l=0)) v12 ==== 268+0+0 (unknown 4105346077 0 0) 0x55dc72cc4700 con 0x55dc71e04800 2023-06-23T19:10:23.669+0000 7f972c759640 7 mds.1.locker handle_client_caps on 0x20000001095 tid 74186 follows 248 op update flags 0x4 2023-06-23T19:10:23.669+0000 7f972c759640 20 mds.1.9 get_session have 0x55dc71d7ea00 client.4966 v1:172.21.15.45:0/229335070 state open 2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker head inode [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pAsLsXsFsxcrwb/Fxcb@3},l=4966 | request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] 2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.cache pick_inode_snap follows 248 on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pAsLsXsFsxcrwb/Fxcb@3},l=4966 | request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] 2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker follows 248 retains pFcb dirty Fw on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFcb/Fxcb@3},l=4966 | request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] 2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker revocation in progress, not making any conclusions about null snapflushes 2023-06-23T19:10:23.669+0000 7f972c759640 7 mds.1.locker flush client.4966 dirty Fw seq 3 on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFcb/Fxcb@3},l=4966 | request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] 2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker wanted Fxcb -> - 2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker _do_cap_update dirty Fw issued pFcb wanted - on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFcb/-@3},l=4966 | request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]
And then in the tick()...-->scatter_tick() thread it tried to revoke the Fb caps for inode 0x20000001095:
2023-06-23T19:10:33.185+0000 7f972af56640 7 mds.1.locker simple_sync on (ifile excl) on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15635 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=11344 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 b11344 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFcb/-@3},l=4966(-1) | ptrwaiter=0 request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] 2023-06-23T19:10:33.185+0000 7f972af56640 7 mds.1.locker get_allowed_caps loner client.4966 allowed=pAsLsXsFscr, xlocker allowed=pAsLsXsFscr, others allowed=pAsLsXs on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15635 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=11344 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 b11344 1=1+0) need_snapflush={248=4966} (ifile excl->sync) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFcb/-@3},l=4966(-1) | ptrwaiter=0 request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] 2023-06-23T19:10:33.185+0000 7f972af56640 20 mds.1.locker client.4966 pending pFcb allowed pAsLsXsFscr wanted - 2023-06-23T19:10:33.185+0000 7f972af56640 7 mds.1.locker sending MClientCaps to client.4966 seq 4 new pending pFc was pFcb 2023-06-23T19:10:33.185+0000 7f972af56640 20 mds.1.cache.ino(0x20000001095) encode_cap_message pfile 0 pauth 0 plink 0 pxattr 0 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1 2023-06-23T19:10:33.185+0000 7f972af56640 10 mds.1.9 send_message_client_counted client.4966 seq 36318 client_caps(revoke ino 0x20000001095 1 seq 4 caps=pFc dirty=- wanted=Ls follows 0 size 11344/4194304 ts 1/18446744073709551615 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1) v12 2023-06-23T19:10:33.185+0000 7f972af56640 1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] --> v1:172.21.15.45:0/229335070 -- client_caps(revoke ino 0x20000001095 1 seq 4 caps=pFc dirty=- wanted=Ls follows 0 size 11344/4194304 ts 1/18446744073709551615 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1) v12 -- 0x55dc73af0380 con 0x55dc71e04800 2023-06-23T19:10:33.185+0000 7f972af56640 10 mds.1.cache.ino(0x20000001095) auth_pin by 0x55dc740fe890 on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15635 ap=2 snaprealm=0x55dc72db3d40 DIRTYPARENT s=11344 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 b11344 1=1+0) need_snapflush={248=4966} (ifile excl->sync) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFc/pFcb/-@4},l=4966(-1) | ptrwaiter=0 request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] now 2
And then stuck, until the MDS received the snapflush request as we mentioned in the beginning.
So when unlinking the file, which were still dirty, why the client stuck so long ? Was the client trying to writing the dirty data to Rados and stuck so long ? or Just because the client was waiting some perodically tick to trigger it ? I just suspect the second case.
Need to go through the kclient code to dig it out.
Updated by Xiubo Li 8 months ago
The above failures are mostly for kclient. But check the libcephfs user space clients, seems having the same issue.
By going through the kclient code, when unlinking a file it will try to queue a cap release, BUT the corresponding inode could be delayed for 10s, which are two times of the tick(), at most. And even it is successfully triggerred 10s later and the check_caps() could miss flushing the dirty buffer to Rados, which will release the Fb caps finally. This could make the MDS caps revocation to be stuck for a long time.
Updated by Xiubo Li 7 months ago
- Status changed from In Progress to Fix Under Review
The kclient patchwork https://patchwork.kernel.org/project/ceph-devel/list/?series=787107.
Updated by Xiubo Li 4 months ago
Kotresh Hiremath Ravishankar wrote:
Thanks Kotresh for reporting this.
From the mds.0 log file, the mds.0 sent a cap revoke to revoke the Fr caps at 2023-12-28T18:25:35:
2023-12-28T18:25:35.289+0000 15c84700 7 mds.0.locker simple_sync on (inest lock w=6 dirty) on [inode 0x10000003158 [...2,head] /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ auth v1793 pv1801 ap=5 DIRTYPARENT f(v0 m2023-12-28T18:25:29.921757+0000 5=2+3)->f(v0 m2023-12-28T18:25:30.909738+0000 3=1+2) n(v49 rc2023-12-28T18:25:29.921757+0000 b85188569 61=51+10)->n(v49 rc2023-12-28T18:25:30.909738+0000 b80994265 57=48+9) (isnap sync r=5) (inest lock w=6 dirty) (ipolicy sync r=3) (ifile excl w=2) (iversion lock w=4) caps={4524=pAsLsXsFsxr/pAsLsXsFsxcral@37},l=4524 | dirtyscattered=1 request=0 lock=5 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x21d00070] 2023-12-28T18:25:35.289+0000 15c84700 10 mds.0.locker invalidate_lock_caches (inest lock->sync w=6 dirty) on [inode 0x10000003158 [...2,head] /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ auth v1793 pv1801 ap=5 DIRTYPARENT f(v0 m2023-12-28T18:25:29.921757+0000 5=2+3)->f(v0 m2023-12-28T18:25:30.909738+0000 3=1+2) n(v49 rc2023-12-28T18:25:29.921757+0000 b85188569 61=51+10)->n(v49 rc2023-12-28T18:25:30.909738+0000 b80994265 57=48+9) (isnap sync r=5) (inest lock->sync w=6 dirty) (ipolicy sync r=3) (ifile excl w=2) (iversion lock w=4) caps={4524=pAsLsXsFsxr/pAsLsXsFsxcral@37},l=4524 | dirtyscattered=1 request=0 lock=5 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x21d00070] 2023-12-28T18:25:35.289+0000 15c84700 7 mds.0.locker get_allowed_caps loner client.4524 allowed=pAsLsXsFsxcrwb, xlocker allowed=pAsLsXsFsxcrwb, others allowed=pAsLsXs on [inode 0x10000003158 [...2,head] /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ auth v1793 pv1801 ap=5 DIRTYPARENT f(v0 m2023-12-28T18:25:29.921757+0000 5=2+3)->f(v0 m2023-12-28T18:25:30.909738+0000 3=1+2) n(v49 rc2023-12-28T18:25:29.921757+0000 b85188569 61=51+10)->n(v49 rc2023-12-28T18:25:30.909738+0000 b80994265 57=48+9) (isnap sync r=5) (inest lock->sync w=6 dirty) (ipolicy sync r=3) (ifile excl w=2) (iversion lock w=4) caps={4524=pAsLsXsFsxr/pAsLsXsFsxcral@37},l=4524 | dirtyscattered=1 request=0 lock=5 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x21d00070] 2023-12-28T18:25:35.289+0000 15c84700 20 mds.0.locker client.4524 pending pAsLsXsFsxr allowed pAsLsXsFsxwb wanted pAsLsXsFsxcral 2023-12-28T18:25:35.289+0000 15c84700 7 mds.0.locker sending MClientCaps to client.4524 seq 38 new pending pAsLsXsFsx was pAsLsXsFsxr 2023-12-28T18:25:35.289+0000 15c84700 20 mds.0.cache.ino(0x10000003158) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 mtime 2023-12-28T18:25:30.909738+0000 ctime 2023-12-28T18:25:30.909738+0000 change_attr 27 2023-12-28T18:25:35.289+0000 15c84700 10 mds.0.4 send_message_client_counted client.4524 seq 32637 client_caps(revoke ino 0x10000003158 1 seq 38 caps=pAsLsXsFsx dirty=- wanted=pAsLsXsFsxcral follows 0 size 0/0 ts 1/18446744073709551615 mtime 2023-12-28T18:25:30.909738+0000 ctime 2023-12-28T18:25:30.909738+0000 change_attr 27) v11 2023-12-28T18:25:35.289+0000 15c84700 1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] --> v1:192.168.0.1:0/1895760544 -- client_caps(revoke ino 0x10000003158 1 seq 38 caps=pAsLsXsFsx dirty=- wanted=pAsLsXsFsxcral follows 0 size 0/0 ts 1/18446744073709551615 mtime 2023-12-28T18:25:30.909738+0000 ctime 2023-12-28T18:25:30.909738+0000 change_attr 27) v11 -- 0x239378c0 con 0x1c5b5a60 2023-12-28T18:25:35.289+0000 15c84700 10 mds.0.cache.ino(0x10000003158) auth_pin by 0x21d00400 on [inode 0x10000003158 [...2,head] /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ auth v1793 pv1801 ap=6 DIRTYPARENT f(v0 m2023-12-28T18:25:29.921757+0000 5=2+3)->f(v0 m2023-12-28T18:25:30.909738+0000 3=1+2) n(v49 rc2023-12-28T18:25:29.921757+0000 b85188569 61=51+10)->n(v49 rc2023-12-28T18:25:30.909738+0000 b80994265 57=48+9) (isnap sync r=5) (inest lock->sync w=6 dirty) (ipolicy sync r=3) (ifile excl w=2) (iversion lock w=4) caps={4524=pAsLsXsFsx/pAsLsXsFsxr/pAsLsXsFsxcral@38},l=4524 | dirtyscattered=1 request=0 lock=5 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x21d00070] now 6
But after 300 seconds the mds.0 still didn't receive the reply:
2023-12-28T18:30:39.739+0000 15c84700 20 mds.0.locker caps_tick 4 revoking caps 2023-12-28T18:30:39.739+0000 15c84700 20 mds.0.locker caps_tick age = 304.446289 client.4524.0x10000003158 2023-12-28T18:30:39.743+0000 15c84700 0 log_channel(cluster) log [WRN] : client.4524 isn't responding to mclientcaps(revoke), ino 0x10000003158 pending pAsLsXsFsx issued pAsLsXsFsxr, sent 304.446289 seconds ago 2023-12-28T18:30:39.751+0000 15c84700 20 mds.0.locker caps_tick client.4524 isn't responding to mclientcaps(revoke), ino 0x10000003158 pending pAsLsXsFsx issued pAsLsXsFsxr, sent 304.446289 seconds ago 2023-12-28T18:30:39.751+0000 15c84700 20 mds.0.locker caps_tick age = 299.297461 client.4524.0x10000003212 2023-12-28T18:30:39.751+0000 15c84700 20 mds.0.locker caps_tick age below timeout 300.000000
And around 1 minutes later the cap revoke reply came:
2023-12-28T18:31:46.926+0000 14481700 1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 50014 ==== client_caps(update ino 0x10000003158 1 seq 47 caps=pAsXsFsx dirty=- wanted=pAsLsXsFsxcral follows 0 size 88/0 mtime 2023-12-28T18:31:46.238134+0000 ctime 2023-12-28T18:31:46.238134+0000 change_attr 30 xattrs(v=11132956415498150912 l=0)) v12 ==== 268+0+0 (unknown 216644255 0 0) 0xe8728b0 con 0x1c5b5a60
And during those 300s the kclient detected that the mds0 hung:
2023-12-28T17:52:28.225073+00:00 smithi064 kernel: [ 1591.647840] Key type ceph registered 2023-12-28T17:52:28.225092+00:00 smithi064 kernel: [ 1591.648056] libceph: loaded (mon/osd proto 15/24) 2023-12-28T17:52:28.256994+00:00 smithi064 kernel: [ 1591.679391] ceph: loaded (mds proto 32) 2023-12-28T17:52:28.310256+00:00 smithi064 kernel: [ 1591.733832] libceph: mon0 (1)172.21.15.64:6789 session established 2023-12-28T17:52:28.372749+00:00 smithi064 kernel: [ 1591.794791] libceph: client4524 fsid 19544f59-b57e-4062-87f4-e2575e5a1849 2023-12-28T18:29:27.781034+00:00 smithi064 kernel: [ 3811.243063] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 hung 2023-12-28T18:29:35.949034+00:00 smithi064 kernel: [ 3819.411222] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 caps stale 2023-12-28T18:31:07.089073+00:00 smithi064 kernel: [ 3910.553822] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 came back 2023-12-28T18:31:31.256995+00:00 smithi064 kernel: [ 3934.722305] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 caps renewed 2023-12-28T19:07:13.329289+00:00 smithi064 kernel: [ 6076.841204] ceph-brx: port 1(brx.0) entered disabled state 2023-12-28T19:07:13.333011+00:00 smithi064 kernel: [ 6076.844775] brx.0 (unregistering): left allmulticast mode 2023-12-28T19:07:13.333026+00:00 smithi064 kernel: [ 6076.844789] brx.0 (unregistering): left promiscuous mode 2023-12-28T19:07:13.333029+00:00 smithi064 kernel: [ 6076.844799] ceph-brx: port 1(brx.0) entered disabled state
This is harmless but annoying and confusing to see the warning. I will check the kclient code to see whether could I improve it.
Updated by Xiubo Li 4 months ago
When the unlink request finished the 0x10000003158 just appended and issued the Fr caps:
6204272 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.locker set_xlocks_done on (dn xlock x=1 by 0x20e13b50) [dentry #0x100/stray4/1000000315c [2,head] auth NULL (dn xlock x=1 by 0x20e13b50) (dversion lock w=1 last_client=4524) pv=509 v=508 ap=2 ino=(nil) state=1342177345|new|bottomlru | request=1 lock=2 authpin=1 0x20c90790] 6204273 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.locker set_xlocks_done on (dn xlock x=1 by 0x20e13b50) [dentry #0x1/client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/l1e [2,head] auth (dn xlock x=1 by 0x20e13b50) (dversion lock w=1 last_client=4524) pv=254 v=2 ap=2 ino=0x1000000315c state=1610612736 | request=1 lock=2 inodepin=1 dirty=1 authpin=1 0x21d18d40] 6204274 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.server early_reply 0 ((0) Success) client_request(client.4524:14606 unlink #0x10000003158/l1e 2023-12-28T18:18:05.938140+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 6204275 2023-12-28T18:18:06.578+0000 14481700 20 mds.0.server set_trace_dist snapid head 6204276 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.server set_trace_dist snaprealm snaprealm(0x1 seq 1 lc 0 cr 0 cps 1 snaps={} 0x16dc1e90) len=48 6204277 2023-12-28T18:18:06.578+0000 14481700 20 mds.0.cache.ino(0x10000003158) pfile 1 pauth 1 plink 1 pxattr 1 plocal 1 mtime 2023-12-28T18:18:05.938140+0000 ctime 2023-12-28T18:18:05.938140+0000 change_attr 11 valid=1 6204278 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.cache.ino(0x10000003158) encode_inodestat issuing pAsLsXsFsxcr seq 11 6204279 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.cache.ino(0x10000003158) encode_inodestat caps pAsLsXsFsxcr seq 11 mseq 0 xattrv 0 6204280 2023-12-28T18:18:06.578+0000 14481700 20 mds.0.server set_trace_dist added diri [inode 0x10000003158 [...2,head] /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ auth v821 pv842 ap=5 DIRTYPARENT f(v0 m2023-12-28T18:17:30.774796+0000 6=3+3)->f(v0 m2023-12-28T18:18:05.938140+0000 5=2+3) n(v19 rc2023-12-28T18:17:54.846347+0000 b45886949 33=28+5)->n(v19 rc2023-12-28T18:18:05.938140+0000 b45886185 33=28+5)/n(v19 rc2023-12-28T18:18:01.378225+0000 b45889288 34=29+5) (iauth sync r=1) (isnap sync r=7) (inest lock w=3 dirty) (ipolicy sync r=4) (ifile excl w=2) (iversion lock w=2 last_client=4524) caps={4524=pAsLsXsFsxcr/pAsLsXsFsxcral@11},l=4524 | dirtyscattered=1 request=0 lock=6 dirfrag=1 caps=1 dirtyrstat=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x21d00070] 6204281 2023-12-28T18:18:06.578+0000 14481700 20 mds.0.server set_trace_dist added dir [dir 0x10000003158 /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ [2,head] auth pv=255 v=251 cv=0/0 ap=4+8 state=1610874881|complete f(v0 m2023-12-28 T18:17:30.774796+0000 6=3+3)->f(v0 m2023-12-28T18:18:05.938140+0000 5=2+3) n(v19 rc2023-12-28T18:17:54.846347+0000 b45886949 32=28+4)->n(v19 rc2023-12-28T18:18:05.938140+0000 b45886185 32=28+4) hs=6+2,ss=0+0 dirty=8 | child=1 dirty=1 authpin=1 0x203b6990]
Updated by Xiubo Li 4 months ago
In kclient when releasing the requests it will skip checking the dirop caps, which means in case there is any caps of 'Fsxr' under revoking, it will skip releasing the caps too in time. And only when the next time the kclient will do the check caps will it be possible to release the caps.
I sent out one kernel patch to fix it https://patchwork.kernel.org/project/ceph-devel/list/?series=814262.
But I am still haven't figured out why the delayed_work thread was stuck more than 300s and didn't send any renew caps request during that, which will make the corresponding session in the HUNG state:
2023-12-28T17:52:28.225073+00:00 smithi064 kernel: [ 1591.647840] Key type ceph registered 2023-12-28T17:52:28.225092+00:00 smithi064 kernel: [ 1591.648056] libceph: loaded (mon/osd proto 15/24) 2023-12-28T17:52:28.256994+00:00 smithi064 kernel: [ 1591.679391] ceph: loaded (mds proto 32) 2023-12-28T17:52:28.310256+00:00 smithi064 kernel: [ 1591.733832] libceph: mon0 (1)172.21.15.64:6789 session established 2023-12-28T17:52:28.372749+00:00 smithi064 kernel: [ 1591.794791] libceph: client4524 fsid 19544f59-b57e-4062-87f4-e2575e5a1849 2023-12-28T18:29:27.781034+00:00 smithi064 kernel: [ 3811.243063] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 hung 2023-12-28T18:29:35.949034+00:00 smithi064 kernel: [ 3819.411222] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 caps stale 2023-12-28T18:31:07.089073+00:00 smithi064 kernel: [ 3910.553822] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 came back 2023-12-28T18:31:31.256995+00:00 smithi064 kernel: [ 3934.722305] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 caps renewed 2023-12-28T19:07:13.329289+00:00 smithi064 kernel: [ 6076.841204] ceph-brx: port 1(brx.0) entered disabled state 2023-12-28T19:07:13.333011+00:00 smithi064 kernel: [ 6076.844775] brx.0 (unregistering): left allmulticast mode 2023-12-28T19:07:13.333026+00:00 smithi064 kernel: [ 6076.844789] brx.0 (unregistering): left promiscuous mode 2023-12-28T19:07:13.333029+00:00 smithi064 kernel: [ 6076.844799] ceph-brx: port 1(brx.0) entered disabled state
While during those 300s the kclient kept sending client request frequently.
And suddenly 300s later the kclient sent 5 renew caps requests in 30s, while usually the delayed_work should send a renew cap request per 75s:
2023-12-28T18:24:26.990+0000 14481700 1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 40455 ==== client_session(request_renewcaps seq 25) v7 ==== 175+0+0 (unknown 1820982459 0 0) 0x1f9e6e90 con 0x1c5b5a60 2023-12-28T18:31:07.050+0000 14481700 1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 49400 ==== client_session(request_renewcaps seq 26) v7 ==== 175+0+0 (unknown 1600306678 0 0) 0x1f972790 con 0x1c5b5a60 2023-12-28T18:31:07.362+0000 14481700 1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 49491 ==== client_session(request_renewcaps seq 27) v7 ==== 175+0+0 (unknown 1075391854 0 0) 0x2298a410 con 0x1c5b5a60 2023-12-28T18:31:07.378+0000 14481700 1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 49572 ==== client_session(request_renewcaps seq 28) v7 ==== 175+0+0 (unknown 2804469824 0 0) 0x2155edd0 con 0x1c5b5a60 2023-12-28T18:31:31.006+0000 14481700 1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 49742 ==== client_session(request_renewcaps seq 29) v7 ==== 175+0+0 (unknown 3092463832 0 0) 0x2246a2d0 con 0x1c5b5a60 2023-12-28T18:31:31.210+0000 14481700 1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 49822 ==== client_session(request_renewcaps seq 30) v7 ==== 175+0+0 (unknown 2816952903 0 0) 0x214025d0 con 0x1c5b5a60 2023-12-28T18:32:10.097+0000 14481700 1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 50381 ==== client_session(request_renewcaps seq 31) v7 ==== 175+0+0 (unknown 769413265 0 0) 0xe9994c0 con 0x1c5b5a60 2023-12-28T18:33:25.120+0000 14481700 1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 51657 ==== client_session(request_renewcaps seq 32) v7 ==== 175+0+0 (unknown 927630548 0 0) 0xe6ae3d0 con 0x1c5b5a60
It seems the renew caps requests were stuck somewhere.
Updated by Xiubo Li 4 months ago
Xiubo Li wrote:
In kclient when releasing the requests it will skip checking the dirop caps, which means in case there is any caps of 'Fsxr' under revoking, it will skip releasing the caps too in time. And only when the next time the kclient will do the check caps will it be possible to release the caps.
I sent out one kernel patch to fix it https://patchwork.kernel.org/project/ceph-devel/list/?series=814262.
But I am still haven't figured out why the delayed_work thread was stuck more than 300s and didn't send any renew caps request during that, which will make the corresponding session in the HUNG state:
[...]
While during those 300s the kclient kept sending client request frequently.
And suddenly 300s later the kclient sent 5 renew caps requests in 30s, while usually the delayed_work should send a renew cap request per 75s:
[...]
It seems the renew caps requests were stuck somewhere.
During the 300 seconds there were around 8000 client_caps update requests from the kclient and there was no any normal other normal client requests except the client_metric update ones:
$ grep 'client_caps(update ino ' b.log | wc -l
8176
While in kclient side the renewcaps and the delayed client_caps update requests will be handled in the same work thread, and the client_metric will be handled in a dedicated work thread. Since we can see client_metric requests periodically during the 8176 client_caps update requests, it means that in ceph mds side the finisher queue shouldn't be to large, which could make the renewcaps requests deferred. The stuck should be in kcleint side in the work thread itself.
For example when there are too many delayed client_caps update requests in the list, then in the work thread it may stuck some seconds to handle them one by one:
5099 static void delayed_work(struct work_struct *work) 5100 { 5101 struct ceph_mds_client *mdsc = 5102 container_of(work, struct ceph_mds_client, delayed_work.work); 5103 unsigned long delay; 5104 int renew_interval; 5105 int renew_caps; 5106 int i; 5107 5108 dout("mdsc delayed_work\n"); 5109 5110 if (mdsc->stopping >= CEPH_MDSC_STOPPING_FLUSHED) 5111 return; 5112 5113 mutex_lock(&mdsc->mutex); 5114 renew_interval = mdsc->mdsmap->m_session_timeout >> 2; 5115 renew_caps = time_after_eq(jiffies, HZ*renew_interval + 5116 mdsc->last_renew_caps); 5117 if (renew_caps) 5118 mdsc->last_renew_caps = jiffies; 5119 5120 for (i = 0; i < mdsc->max_sessions; i++) { 5121 struct ceph_mds_session *s = __ceph_lookup_mds_session(mdsc, i); 5122 if (!s) 5123 continue; 5124 5125 if (!check_session_state(s)) { 5126 ceph_put_mds_session(s); 5127 continue; 5128 } 5129 mutex_unlock(&mdsc->mutex); 5130 5131 mutex_lock(&s->s_mutex); 5132 if (renew_caps) 5133 send_renew_caps(mdsc, s); ---> renew caps 5134 else 5135 ceph_con_keepalive(&s->s_con); 5136 if (s->s_state == CEPH_MDS_SESSION_OPEN || 5137 s->s_state == CEPH_MDS_SESSION_HUNG) 5138 ceph_send_cap_releases(mdsc, s); 5139 mutex_unlock(&s->s_mutex); 5140 ceph_put_mds_session(s); 5141 5142 mutex_lock(&mdsc->mutex); 5143 } 5144 mutex_unlock(&mdsc->mutex); 5145 5146 delay = ceph_check_delayed_caps(mdsc); ----> handle the delayed client_caps updates one by one and stuck here for seconds 5147 5148 ceph_queue_cap_reclaim_work(mdsc); 5149 5150 ceph_trim_snapid_map(mdsc); 5151 5152 maybe_recover_session(mdsc); 5153 5154 schedule_delayed(mdsc, delay); 5155 }
Updated by Xiubo Li about 2 months ago
In upstream we have fixed it since v6.8-rc5.