Project

General

Profile

Actions

Bug #50223

open

qa: "client.4737 isn't responding to mclientcaps(revoke)"

Added by Patrick Donnelly about 3 years ago. Updated 2 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2021-04-07T06:14:51.658 INFO:teuthology.orchestra.run.smithi029.stdout:2021-04-07T05:38:28.731361+0000 mds.c (mds.1) 2 : cluster [WRN] client.4737 isn't responding to mclientcaps(revoke), ino 0x200000001fd pending pAsLsXsFscr issued pAsLsXsFsxcrwb, sent 303.593892 seconds ago

From: /ceph/teuthology-archive/pdonnell-2021-04-07_02:12:41-fs-wip-pdonnell-testing-20210406.213012-distro-basic-smithi/6025722/teuthology.log

With MDS thrashing and FUSE.

Actions #1

Updated by Patrick Donnelly about 3 years ago

osd thrashing and kclient: /ceph/teuthology-archive/pdonnell-2021-04-08_22:42:24-fs-wip-pdonnell-testing-20210408.192301-distro-basic-smithi/6030618/teuthology.log

Actions #2

Updated by Patrick Donnelly about 3 years ago

  • Status changed from New to Triaged
  • Assignee set to Patrick Donnelly

Probably need to lengthen some session timeouts so this goes away. The tests still succeed otherwis.

Actions #3

Updated by Patrick Donnelly over 2 years ago

/ceph/teuthology-archive/pdonnell-2021-09-14_01:17:08-fs-wip-pdonnell-testing-20210910.181451-distro-basic-smithi/6388586/teuthology.log

Actions #5

Updated by Patrick Donnelly almost 2 years ago

  • Target version deleted (v17.0.0)
Actions #11

Updated by Xiubo Li 9 months ago

Rishabh Dave wrote:

https://pulpito.ceph.com/rishabh-2023-06-23_17:37:30-fs-wip-rishabh-improvements-authmon-distro-default-smithi/7313892

The revocation request stuck for more than 300s, ino 0x20000001095 pending pFc issued pFcb:

2023-06-23T19:15:33.186+0000 7f972af56640 10 mds.1.log _trim_expired_segments waiting for 1/4194304 to expire
2023-06-23T19:15:33.186+0000 7f972af56640 10 mds.1.server find_idle_sessions. last cleared laggy state 2773.67s ago
2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.server laggiest active session is client.4966 v1:172.21.15.45:0/229335070 and renewed caps recently (63.4994s ago)
2023-06-23T19:15:33.186+0000 7f972af56640 10 mds.1.locker scatter_tick
2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.locker caps_tick 7 revoking caps
2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.locker caps_tick age = 300.003951 client.4966.0x20000001095
2023-06-23T19:15:33.186+0000 7f972af56640  0 log_channel(cluster) log [WRN] : client.4966 isn't responding to mclientcaps(revoke), ino 0x20000001095 pending pFc issued pFcb, sent 300.003951 seconds ago
2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.locker caps_tick client.4966 isn't responding to mclientcaps(revoke), ino 0x20000001095 pending pFc issued pFcb, sent 300.003951 seconds ago
2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.locker caps_tick age = 290.003490 client.4966.0x2000000109e
2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.locker caps_tick age below timeout 300.000000
2023-06-23T19:15:33.186+0000 7f972af56640 20 mds.1.bal handle_export_pins export_pin_queue size=0
2023-06-23T19:15:33.186+0000 7f972af56640 15 mds.1.bal handle_export_pins number of auth trees = 47; not printing auth trees
2023-06-23T19:15:33.186+0000 7f972af56640 15 mds.1.bal tick tick last_sample now 2773.667067s
...
2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.locker caps_tick 12 revoking caps
2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.locker caps_tick age = 400.006155 client.4966.0x20000001095
2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.locker caps_tick silencing log message (backoff) for client.4966.0x20000001095
2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.locker caps_tick age = 85.001210 client.4966.0x20000000caf
2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.locker caps_tick age below timeout 300.000000
2023-06-23T19:17:13.189+0000 7f972af56640 20 mds.1.bal handle_export_pins export_pin_queue size=0
2023-06-23T19:17:13.189+0000 7f972af56640 15 mds.1.bal handle_export_pins number of auth trees = 47; not printing auth trees
2023-06-23T19:17:13.189+0000 7f972af56640 15 mds.1.bal tick tick last_sample now 2873.670095s

And the above warning disappeared just after flushsnap request came:

2023-06-23T19:17:17.105+0000 7f972c759640  1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] <== client.4966 v1:172.21.15.45:0/229335070 39306 ==== client_caps(flushsnap ino 0x20000001095 0 seq 0 tid 75364 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 247 size 11344/0 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:16.584371+0000 change_attr 1) v12 ==== 268+0+0 (unknown 1851060495 0 0) 0x55dc739d3500 con 0x55dc71e04800
2023-06-23T19:17:17.105+0000 7f972c759640  7 mds.1.locker handle_client_caps  on 0x20000001095 tid 75364 follows 247 op flushsnap flags 0x0
2023-06-23T19:17:17.105+0000 7f972c759640 20 mds.1.9 get_session have 0x55dc71d7ea00 client.4966 v1:172.21.15.45:0/229335070 state open
2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.locker  head inode [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15635 ap=2 snaprealm=0x55dc72db3d40 DIRTYPARENT s=11344 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 b11344 1=1+0) need_snapflush={248=4966} (ifile excl->sync) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFc/pFcb/-@4},l=4966(-1) | ptrwaiter=0 request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]
2023-06-23T19:17:17.105+0000 7f972c759640 10  mds.1.cache.snaprealm(0x20000001095 seq 584 0x55dc72db3d40) get_snaps 248 (seq 248 cached_seq 248)
2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.locker   flushsnap follows 247 -> snap 248
2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.cache pick_inode_snap follows 247 on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15635 ap=2 snaprealm=0x55dc72db3d40 DIRTYPARENT s=11344 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 b11344 1=1+0) need_snapflush={248=4966} (ifile excl->sync) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFc/pFcb/-@4},l=4966(-1) | ptrwaiter=0 request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]
2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.cache pick_inode_snap found [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap auth v6871 ap=5 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4966=0-4194304@247} | request=0 lock=4 dirty=1 authpin=1 0x55dc7425f700]
2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.locker  snapped inode [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap auth v6871 ap=5 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4966=0-4194304@247} | request=0 lock=4 dirty=1 authpin=1 0x55dc7425f700]
2023-06-23T19:17:17.105+0000 7f972c759640  7 mds.1.locker  flushsnap snap 248 client.4966 on [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap auth v6871 ap=5 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4966=0-4194304@247} | request=0 lock=4 dirty=1 authpin=1 0x55dc7425f700]
2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.locker _do_snap_update dirty Fw follows 247 snap 248 on [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap auth v6871 ap=5 s=0 n(v0 1=1+0) (iauth snap->sync w=1) (ilink snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) cr={4966=0-4194304@247} | request=0 lock=4 dirty=1 authpin=1 0x55dc7425f700]
2023-06-23T19:17:17.105+0000 7f972c759640 15 mds.1.cache.ino(0x20000001095) project_inode 0x20000001095
2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.cache.dir(0x10000001701.001*) pre_dirty 7102
2023-06-23T19:17:17.105+0000 7f972c759640 10 mds.1.cache.den(0x10000001701.001* .organizationmap) pre_dirty [dentry #0x1/client.0/tmp/ceph/.organizationmap [248,248] auth (dversion lock) pv=7102 v=6899 ino=0x20000001095 state=1610612736 | request=0 inodepin=1 dirty=1 0x55dc788c1180]
...

Since nl=0 and then later finally the inode 0x20000001095 was removed:

2023-06-23T20:10:09.171+0000 7f972c759640 15 mds.1.cache trim_inode [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap rep@0.1 v7102 s=11344 n(v0 b11344 1=1+0) (iversion lock) 0x55dc7425f700]
2023-06-23T20:10:09.171+0000 7f972c759640 12 mds.1.cache   sending expire to mds.0 on [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap rep@0.1 v7102 s=11344 n(v0 b11344 1=1+0) (iversion lock) 0x55dc7425f700]
2023-06-23T20:10:09.171+0000 7f972c759640 12 mds.1.cache.dir(0x10000001701.001*) unlink_inode [dentry #0x1/client.0/tmp/ceph/.organizationmap [248,248] rep@0.1 (dversion lock) v=7102 ino=0x20000001095 state=0 0x55dc788c1180] [inode 0x20000001095 [248,248] /client.0/tmp/ceph/.organizationmap rep@0.1 v7102 s=11344 n(v0 b11344 1=1+0) (iversion lock) 0x55dc7425f700]
2023-06-23T20:10:09.171+0000 7f972c759640 14 mds.1.cache remove_inode [inode 0x20000001095 [248,248] #20000001095 rep@-2.1 v7102 s=11344 n(v0 b11344 1=1+0) (iversion lock) 0x55dc7425f700]

Why the revocation of Fb stuck so long for inode 0x20000001095 ?

Client just sent a unlink request first and at the same time the unlink request released the ~pFxcwb caps:

2023-06-23T19:10:22.089+0000 7f972c759640  1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] <== client.4966 v1:172.21.15.45:0/229335070 37723 ==== client_request(client.4966:89809 unlink #0x10000001701/.organizationmap 2023-06-23T19:10:21.936301+0000 caller_uid=1000, caller_gid=1271{6,27,108,1271,10102,}) v6 ==== 308+0+0 (unknown 3314630587 0 0) 0x55dc76a23500 con 0x55dc71e04800
...
2023-06-23T19:10:22.089+0000 7f972c759640  4 mds.1.server handle_client_request client_request(client.4966:89809 unlink #0x10000001701/.organizationmap 2023-06-23T19:10:21.936301+0000 caller_uid=1000, caller_gid=1271{6,27,108,1271,10102,}) v6
2023-06-23T19:10:22.089+0000 7f972c759640 20 mds.1.9 get_session have 0x55dc71d7ea00 client.4966 v1:172.21.15.45:0/229335070 state open
2023-06-23T19:10:22.089+0000 7f972c759640 15 mds.1.server  oldest_client_tid=89803
2023-06-23T19:10:22.089+0000 7f972c759640  7 mds.1.cache request_start request(client.4966:89809 nref=2 cr=0x55dc76a23500)
2023-06-23T19:10:22.089+0000 7f972c759640 10 mds.1.locker process_request_cap_release client.4966 pFxcwb on [inode 0x20000001095 [248,head] /client.0/tmp/ceph/.organizationmap auth v6871 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=4966 | request=0 caps=1 dirtyparent=1 dirty=1 0x55dc740fe580]
2023-06-23T19:10:22.089+0000 7f972c759640 10 mds.1.locker  wanted pAsxXsxFxwb -> Fxcb
...

I think why it still held the Fb caps was that there still had dirty data not flushed yet in client side and marked needsnapflush=1:

2023-06-23T19:10:22.093+0000 7f972c759640 10 mds.1.cache.ino(0x20000001095) add_need_snapflush client.4966 snapid 248 on 0x55dc7425f700
2023-06-23T19:10:22.093+0000 7f972c759640 10 mds.1.cache.ino(0x20000001095) auth_pin by 0x55dc740fe580 on [inode 0x20000001095 [249,head] ~mds1/stray6/20000001095 auth v6871 pv15589 ap=4 DIRTYPARENT s=0 n(v0 1=1+0)->n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) (ilink xlock x=1 by 0x55dc72913c00) (ifile excl) (iversion lock w=1 last_client=4966) cr={4966=0-4194304@247} caps={4966=pFxcwb/Fxcb@2},l=4966 | request=1 lock=3 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] now 4

And then early repled it:

2023-06-23T19:10:22.093+0000 7f972c759640 10 mds.1.locker set_xlocks_done on (dn xlock x=1 by 0x55dc72913c00) [dentry #0x101/stray6/20000001095 [249,head] auth NULL (dn xlock x=1 by 0x55dc72913c00) (dversion lock w=1 last_client=4966) pv=15589 v=15588 ap=2 ino=(nil) state=1342177281|new | request=1 lock=2 authpin=1 0x55dc788c0f00]
2023-06-23T19:10:22.093+0000 7f972c759640 10 mds.1.server early_reply 0 ((0) Success) client_request(client.4966:89809 unlink #0x10000001701/.organizationmap 2023-06-23T19:10:21.936301+0000 caller_uid=1000, caller_gid=1271{6,27,108,1271,10102,}) v6

And then safe replied it and at the same time set the client as loner by issuing pAsLsXsFsxcrwb caps:

2023-06-23T19:10:22.129+0000 7f972674d640  7 mds.1.locker get_allowed_caps loner client.4966 allowed=pAsLsXsFsxcrwb, xlocker allowed=pAsLsXsFsxcrwb, others allowed=pAsLsXs on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=2 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFxcwb/Fxcb@2},l=4966 | request=1 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]
2023-06-23T19:10:22.129+0000 7f972674d640 20 mds.1.locker  client.4966 pending pFxcwb allowed pAsLsXsFsxcrwb wanted Fxcb
2023-06-23T19:10:22.129+0000 7f972674d640  7 mds.1.locker    sending MClientCaps to client.4966 seq 3 new pending pAsLsXsFsxcrwb was pFxcwb
2023-06-23T19:10:22.129+0000 7f972674d640 20 mds.1.cache.ino(0x20000001095) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1
2023-06-23T19:10:22.129+0000 7f972674d640 10 mds.1.9 send_message_client_counted client.4966 seq 36035 client_caps(grant ino 0x20000001095 1 seq 3 caps=pAsLsXsFsxcrwb dirty=- wanted=LsFxcb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1) v12
2023-06-23T19:10:22.129+0000 7f972674d640  1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] --> v1:172.21.15.45:0/229335070 -- client_caps(grant ino 0x20000001095 1 seq 3 caps=pAsLsXsFsxcrwb dirty=- wanted=LsFxcb follows 0 size 0/4194304 ts 1/18446744073709551615 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1) v12 -- 0x55dc72410700 con 0x55dc71e04800
2023-06-23T19:10:22.129+0000 7f972674d640 10 mds.1.9 send_message_client client.4966 v1:172.21.15.45:0/229335070 client_reply(???:89809 = 0 (0) Success safe) v1
2023-06-23T19:10:22.129+0000 7f972674d640  1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] --> v1:172.21.15.45:0/229335070 -- client_reply(???:89809 = 0 (0) Success safe) v1 -- 0x55dc749a6540 con 0x55dc71e04800

And then client release all the caps except Fcb:

2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker eval done
2023-06-23T19:10:23.669+0000 7f972c759640  1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] <== client.4966 v1:172.21.15.45:0/229335070 37792 ==== client_caps(update ino 0x20000001095 1 seq 3 tid 74186 caps=pFcb dirty=Fw wanted=- follows 248 size 11344/0 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1 xattrs(v=4655440983702587648 l=0)) v12 ==== 268+0+0 (unknown 4105346077 0 0) 0x55dc72cc4700 con 0x55dc71e04800
2023-06-23T19:10:23.669+0000 7f972c759640  7 mds.1.locker handle_client_caps  on 0x20000001095 tid 74186 follows 248 op update flags 0x4
2023-06-23T19:10:23.669+0000 7f972c759640 20 mds.1.9 get_session have 0x55dc71d7ea00 client.4966 v1:172.21.15.45:0/229335070 state open
2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker  head inode [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pAsLsXsFsxcrwb/Fxcb@3},l=4966 | request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]
2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.cache pick_inode_snap follows 248 on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pAsLsXsFsxcrwb/Fxcb@3},l=4966 | request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]
2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker  follows 248 retains pFcb dirty Fw on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFcb/Fxcb@3},l=4966 | request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]
2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker  revocation in progress, not making any conclusions about null snapflushes
2023-06-23T19:10:23.669+0000 7f972c759640  7 mds.1.locker  flush client.4966 dirty Fw seq 3 on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFcb/Fxcb@3},l=4966 | request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]
2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker  wanted Fxcb -> -
2023-06-23T19:10:23.669+0000 7f972c759640 10 mds.1.locker _do_cap_update dirty Fw issued pFcb wanted - on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15589 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=0 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFcb/-@3},l=4966 | request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]

And then in the tick()...-->scatter_tick() thread it tried to revoke the Fb caps for inode 0x20000001095:

2023-06-23T19:10:33.185+0000 7f972af56640  7 mds.1.locker simple_sync on (ifile excl) on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15635 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=11344 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 b11344 1=1+0) need_snapflush={248=4966} (ifile excl) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFcb/-@3},l=4966(-1) | ptrwaiter=0 request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]
2023-06-23T19:10:33.185+0000 7f972af56640  7 mds.1.locker get_allowed_caps loner client.4966 allowed=pAsLsXsFscr, xlocker allowed=pAsLsXsFscr, others allowed=pAsLsXs on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15635 ap=1 snaprealm=0x55dc72db3d40 DIRTYPARENT s=11344 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 b11344 1=1+0) need_snapflush={248=4966} (ifile excl->sync) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFcb/-@3},l=4966(-1) | ptrwaiter=0 request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580]
2023-06-23T19:10:33.185+0000 7f972af56640 20 mds.1.locker  client.4966 pending pFcb allowed pAsLsXsFscr wanted -
2023-06-23T19:10:33.185+0000 7f972af56640  7 mds.1.locker    sending MClientCaps to client.4966 seq 4 new pending pFc was pFcb
2023-06-23T19:10:33.185+0000 7f972af56640 20 mds.1.cache.ino(0x20000001095) encode_cap_message pfile 0 pauth 0 plink 0 pxattr 0 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1
2023-06-23T19:10:33.185+0000 7f972af56640 10 mds.1.9 send_message_client_counted client.4966 seq 36318 client_caps(revoke ino 0x20000001095 1 seq 4 caps=pFc dirty=- wanted=Ls follows 0 size 11344/4194304 ts 1/18446744073709551615 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1) v12
2023-06-23T19:10:33.185+0000 7f972af56640  1 -- [v2:172.21.15.193:6836/819541402,v1:172.21.15.193:6837/819541402] --> v1:172.21.15.45:0/229335070 -- client_caps(revoke ino 0x20000001095 1 seq 4 caps=pFc dirty=- wanted=Ls follows 0 size 11344/4194304 ts 1/18446744073709551615 mtime 2023-06-23T19:10:16.584371+0000 ctime 2023-06-23T19:10:21.936301+0000 change_attr 1) v12 -- 0x55dc73af0380 con 0x55dc71e04800
2023-06-23T19:10:33.185+0000 7f972af56640 10 mds.1.cache.ino(0x20000001095) auth_pin by 0x55dc740fe890 on [inode 0x20000001095 [...249,head] ~mds1/stray6/20000001095 auth v15635 ap=2 snaprealm=0x55dc72db3d40 DIRTYPARENT s=11344 nl=0 n(v0 rc2023-06-23T19:10:21.936301+0000 b11344 1=1+0) need_snapflush={248=4966} (ifile excl->sync) (iversion lock) cr={4966=0-4194304@247} caps={4966=pFc/pFcb/-@4},l=4966(-1) | ptrwaiter=0 request=0 lock=0 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 authpin=1 0x55dc740fe580] now 2

And then stuck, until the MDS received the snapflush request as we mentioned in the beginning.

So when unlinking the file, which were still dirty, why the client stuck so long ? Was the client trying to writing the dirty data to Rados and stuck so long ? or Just because the client was waiting some perodically tick to trigger it ? I just suspect the second case.

Need to go through the kclient code to dig it out.

Actions #12

Updated by Xiubo Li 9 months ago

  • Assignee changed from Patrick Donnelly to Xiubo Li

Will take it as discussed with Patrick.

Actions #13

Updated by Xiubo Li 9 months ago

  • Status changed from Triaged to In Progress
Actions #14

Updated by Xiubo Li 9 months ago

  • Project changed from CephFS to Linux kernel client
Actions #15

Updated by Xiubo Li 9 months ago

The above failures are mostly for kclient. But check the libcephfs user space clients, seems having the same issue.

By going through the kclient code, when unlinking a file it will try to queue a cap release, BUT the corresponding inode could be delayed for 10s, which are two times of the tick(), at most. And even it is successfully triggerred 10s later and the check_caps() could miss flushing the dirty buffer to Rados, which will release the Fb caps finally. This could make the MDS caps revocation to be stuck for a long time.

Actions #16

Updated by Xiubo Li 8 months ago

  • Status changed from In Progress to Fix Under Review
Actions #18

Updated by Xiubo Li 5 months ago

Kotresh Hiremath Ravishankar wrote:

https://pulpito.ceph.com/yuriw-2023-12-28_15:45:00-fs-wip-yuri7-testing-2023-12-27-1008-pacific-distro-default-smithi/7503019

Thanks Kotresh for reporting this.

From the mds.0 log file, the mds.0 sent a cap revoke to revoke the Fr caps at 2023-12-28T18:25:35:

2023-12-28T18:25:35.289+0000 15c84700  7 mds.0.locker simple_sync on (inest lock w=6 dirty) on [inode 0x10000003158 [...2,head] /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ auth v1793 pv1801 ap=5 DIRTYPARENT f(v0 m2023-12-28T18:25:29.921757+0000 5=2+3)->f(v0 m2023-12-28T18:25:30.909738+0000 3=1+2) n(v49 rc2023-12-28T18:25:29.921757+0000 b85188569 61=51+10)->n(v49 rc2023-12-28T18:25:30.909738+0000 b80994265 57=48+9) (isnap sync r=5) (inest lock w=6 dirty) (ipolicy sync r=3) (ifile excl w=2) (iversion lock w=4) caps={4524=pAsLsXsFsxr/pAsLsXsFsxcral@37},l=4524 | dirtyscattered=1 request=0 lock=5 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x21d00070]
2023-12-28T18:25:35.289+0000 15c84700 10 mds.0.locker invalidate_lock_caches (inest lock->sync w=6 dirty) on [inode 0x10000003158 [...2,head] /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ auth v1793 pv1801 ap=5 DIRTYPARENT f(v0 m2023-12-28T18:25:29.921757+0000 5=2+3)->f(v0 m2023-12-28T18:25:30.909738+0000 3=1+2) n(v49 rc2023-12-28T18:25:29.921757+0000 b85188569 61=51+10)->n(v49 rc2023-12-28T18:25:30.909738+0000 b80994265 57=48+9) (isnap sync r=5) (inest lock->sync w=6 dirty) (ipolicy sync r=3) (ifile excl w=2) (iversion lock w=4) caps={4524=pAsLsXsFsxr/pAsLsXsFsxcral@37},l=4524 | dirtyscattered=1 request=0 lock=5 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x21d00070]
2023-12-28T18:25:35.289+0000 15c84700  7 mds.0.locker get_allowed_caps loner client.4524 allowed=pAsLsXsFsxcrwb, xlocker allowed=pAsLsXsFsxcrwb, others allowed=pAsLsXs on [inode 0x10000003158 [...2,head] /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ auth v1793 pv1801 ap=5 DIRTYPARENT f(v0 m2023-12-28T18:25:29.921757+0000 5=2+3)->f(v0 m2023-12-28T18:25:30.909738+0000 3=1+2) n(v49 rc2023-12-28T18:25:29.921757+0000 b85188569 61=51+10)->n(v49 rc2023-12-28T18:25:30.909738+0000 b80994265 57=48+9) (isnap sync r=5) (inest lock->sync w=6 dirty) (ipolicy sync r=3) (ifile excl w=2) (iversion lock w=4) caps={4524=pAsLsXsFsxr/pAsLsXsFsxcral@37},l=4524 | dirtyscattered=1 request=0 lock=5 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x21d00070]
2023-12-28T18:25:35.289+0000 15c84700 20 mds.0.locker  client.4524 pending pAsLsXsFsxr allowed pAsLsXsFsxwb wanted pAsLsXsFsxcral
2023-12-28T18:25:35.289+0000 15c84700  7 mds.0.locker    sending MClientCaps to client.4524 seq 38 new pending pAsLsXsFsx was pAsLsXsFsxr
2023-12-28T18:25:35.289+0000 15c84700 20 mds.0.cache.ino(0x10000003158) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 mtime 2023-12-28T18:25:30.909738+0000 ctime 2023-12-28T18:25:30.909738+0000 change_attr 27
2023-12-28T18:25:35.289+0000 15c84700 10 mds.0.4 send_message_client_counted client.4524 seq 32637 client_caps(revoke ino 0x10000003158 1 seq 38 caps=pAsLsXsFsx dirty=- wanted=pAsLsXsFsxcral follows 0 size 0/0 ts 1/18446744073709551615 mtime 2023-12-28T18:25:30.909738+0000 ctime 2023-12-28T18:25:30.909738+0000 change_attr 27) v11
2023-12-28T18:25:35.289+0000 15c84700  1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] --> v1:192.168.0.1:0/1895760544 -- client_caps(revoke ino 0x10000003158 1 seq 38 caps=pAsLsXsFsx dirty=- wanted=pAsLsXsFsxcral follows 0 size 0/0 ts 1/18446744073709551615 mtime 2023-12-28T18:25:30.909738+0000 ctime 2023-12-28T18:25:30.909738+0000 change_attr 27) v11 -- 0x239378c0 con 0x1c5b5a60
2023-12-28T18:25:35.289+0000 15c84700 10 mds.0.cache.ino(0x10000003158) auth_pin by 0x21d00400 on [inode 0x10000003158 [...2,head] /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ auth v1793 pv1801 ap=6 DIRTYPARENT f(v0 m2023-12-28T18:25:29.921757+0000 5=2+3)->f(v0 m2023-12-28T18:25:30.909738+0000 3=1+2) n(v49 rc2023-12-28T18:25:29.921757+0000 b85188569 61=51+10)->n(v49 rc2023-12-28T18:25:30.909738+0000 b80994265 57=48+9) (isnap sync r=5) (inest lock->sync w=6 dirty) (ipolicy sync r=3) (ifile excl w=2) (iversion lock w=4) caps={4524=pAsLsXsFsx/pAsLsXsFsxr/pAsLsXsFsxcral@38},l=4524 | dirtyscattered=1 request=0 lock=5 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x21d00070] now 6

But after 300 seconds the mds.0 still didn't receive the reply:

2023-12-28T18:30:39.739+0000 15c84700 20 mds.0.locker caps_tick 4 revoking caps
2023-12-28T18:30:39.739+0000 15c84700 20 mds.0.locker caps_tick age = 304.446289 client.4524.0x10000003158
2023-12-28T18:30:39.743+0000 15c84700  0 log_channel(cluster) log [WRN] : client.4524 isn't responding to mclientcaps(revoke), ino 0x10000003158 pending pAsLsXsFsx issued pAsLsXsFsxr, sent 304.446289 seconds ago
2023-12-28T18:30:39.751+0000 15c84700 20 mds.0.locker caps_tick client.4524 isn't responding to mclientcaps(revoke), ino 0x10000003158 pending pAsLsXsFsx issued pAsLsXsFsxr, sent 304.446289 seconds ago
2023-12-28T18:30:39.751+0000 15c84700 20 mds.0.locker caps_tick age = 299.297461 client.4524.0x10000003212
2023-12-28T18:30:39.751+0000 15c84700 20 mds.0.locker caps_tick age below timeout 300.000000

And around 1 minutes later the cap revoke reply came:

2023-12-28T18:31:46.926+0000 14481700  1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 50014 ==== client_caps(update ino 0x10000003158 1 seq 47 caps=pAsXsFsx dirty=- wanted=pAsLsXsFsxcral follows 0 size 88/0 mtime 2023-12-28T18:31:46.238134+0000 ctime 2023-12-28T18:31:46.238134+0000 change_attr 30 xattrs(v=11132956415498150912 l=0)) v12 ==== 268+0+0 (unknown 216644255 0 0) 0xe8728b0 con 0x1c5b5a60

And during those 300s the kclient detected that the mds0 hung:

2023-12-28T17:52:28.225073+00:00 smithi064 kernel: [ 1591.647840] Key type ceph registered
2023-12-28T17:52:28.225092+00:00 smithi064 kernel: [ 1591.648056] libceph: loaded (mon/osd proto 15/24)
2023-12-28T17:52:28.256994+00:00 smithi064 kernel: [ 1591.679391] ceph: loaded (mds proto 32)
2023-12-28T17:52:28.310256+00:00 smithi064 kernel: [ 1591.733832] libceph: mon0 (1)172.21.15.64:6789 session established
2023-12-28T17:52:28.372749+00:00 smithi064 kernel: [ 1591.794791] libceph: client4524 fsid 19544f59-b57e-4062-87f4-e2575e5a1849
2023-12-28T18:29:27.781034+00:00 smithi064 kernel: [ 3811.243063] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 hung
2023-12-28T18:29:35.949034+00:00 smithi064 kernel: [ 3819.411222] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 caps stale
2023-12-28T18:31:07.089073+00:00 smithi064 kernel: [ 3910.553822] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 came back
2023-12-28T18:31:31.256995+00:00 smithi064 kernel: [ 3934.722305] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 caps renewed
2023-12-28T19:07:13.329289+00:00 smithi064 kernel: [ 6076.841204] ceph-brx: port 1(brx.0) entered disabled state
2023-12-28T19:07:13.333011+00:00 smithi064 kernel: [ 6076.844775] brx.0 (unregistering): left allmulticast mode
2023-12-28T19:07:13.333026+00:00 smithi064 kernel: [ 6076.844789] brx.0 (unregistering): left promiscuous mode
2023-12-28T19:07:13.333029+00:00 smithi064 kernel: [ 6076.844799] ceph-brx: port 1(brx.0) entered disabled state

This is harmless but annoying and confusing to see the warning. I will check the kclient code to see whether could I improve it.

Actions #19

Updated by Xiubo Li 5 months ago

When the unlink request finished the 0x10000003158 just appended and issued the Fr caps:

 6204272 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.locker set_xlocks_done on (dn xlock x=1 by 0x20e13b50) [dentry #0x100/stray4/1000000315c [2,head] auth NULL (dn xlock x=1 by 0x20e13b50) (dversion lock w=1 last_client=4524) pv=509 v=508 ap=2 ino=(nil) state=1342177345|new|bottomlru | request=1 lock=2 authpin=1 0x20c90790]
 6204273 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.locker set_xlocks_done on (dn xlock x=1 by 0x20e13b50) [dentry #0x1/client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/l1e [2,head] auth (dn xlock x=1 by 0x20e13b50) (dversion lock w=1 last_client=4524) pv=254 v=2 ap=2 ino=0x1000000315c state=1610612736 | request=1 lock=2 inodepin=1 dirty=1 authpin=1 0x21d18d40]
 6204274 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.server early_reply 0 ((0) Success) client_request(client.4524:14606 unlink #0x10000003158/l1e 2023-12-28T18:18:05.938140+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6
 6204275 2023-12-28T18:18:06.578+0000 14481700 20 mds.0.server set_trace_dist snapid head                                                                                                                                              
 6204276 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.server set_trace_dist snaprealm snaprealm(0x1 seq 1 lc 0 cr 0 cps 1 snaps={} 0x16dc1e90) len=48
 6204277 2023-12-28T18:18:06.578+0000 14481700 20 mds.0.cache.ino(0x10000003158)  pfile 1 pauth 1 plink 1 pxattr 1 plocal 1 mtime 2023-12-28T18:18:05.938140+0000 ctime 2023-12-28T18:18:05.938140+0000 change_attr 11 valid=1
 6204278 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.cache.ino(0x10000003158) encode_inodestat issuing pAsLsXsFsxcr seq 11
 6204279 2023-12-28T18:18:06.578+0000 14481700 10 mds.0.cache.ino(0x10000003158) encode_inodestat caps pAsLsXsFsxcr seq 11 mseq 0 xattrv 0
 6204280 2023-12-28T18:18:06.578+0000 14481700 20 mds.0.server set_trace_dist added diri [inode 0x10000003158 [...2,head] /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ auth v821 pv842 ap=5 DIRTYPARENT f(v0 m2023-12-28T18:17:30.774796+0000 6=3+3)->f(v0 m2023-12-28T18:18:05.938140+0000 5=2+3) n(v19 rc2023-12-28T18:17:54.846347+0000 b45886949 33=28+5)->n(v19 rc2023-12-28T18:18:05.938140+0000 b45886185 33=28+5)/n(v19 rc2023-12-28T18:18:01.378225+0000 b45889288 34=29+5) (iauth sync r=1) (isnap sync r=7) (inest lock w=3 dirty) (ipolicy sync r=4) (ifile excl w=2) (iversion lock w=2 last_client=4524) caps={4524=pAsLsXsFsxcr/pAsLsXsFsxcral@11},l=4524 | dirtyscattered=1 request=0 lock=6 dirfrag=1 caps=1 dirtyrstat=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x21d00070]
 6204281 2023-12-28T18:18:06.578+0000 14481700 20 mds.0.server set_trace_dist added dir  [dir 0x10000003158 /client.0/tmp/tmp.5FKrhWTDrX/p7/d0/d1d/ [2,head] auth pv=255 v=251 cv=0/0 ap=4+8 state=1610874881|complete f(v0 m2023-12-28 T18:17:30.774796+0000 6=3+3)->f(v0 m2023-12-28T18:18:05.938140+0000 5=2+3) n(v19 rc2023-12-28T18:17:54.846347+0000 b45886949 32=28+4)->n(v19 rc2023-12-28T18:18:05.938140+0000 b45886185 32=28+4) hs=6+2,ss=0+0 dirty=8 | child=1 dirty=1 authpin=1 0x203b6990]
Actions #20

Updated by Xiubo Li 5 months ago

In kclient when releasing the requests it will skip checking the dirop caps, which means in case there is any caps of 'Fsxr' under revoking, it will skip releasing the caps too in time. And only when the next time the kclient will do the check caps will it be possible to release the caps.

I sent out one kernel patch to fix it https://patchwork.kernel.org/project/ceph-devel/list/?series=814262.

But I am still haven't figured out why the delayed_work thread was stuck more than 300s and didn't send any renew caps request during that, which will make the corresponding session in the HUNG state:

2023-12-28T17:52:28.225073+00:00 smithi064 kernel: [ 1591.647840] Key type ceph registered
2023-12-28T17:52:28.225092+00:00 smithi064 kernel: [ 1591.648056] libceph: loaded (mon/osd proto 15/24)
2023-12-28T17:52:28.256994+00:00 smithi064 kernel: [ 1591.679391] ceph: loaded (mds proto 32)
2023-12-28T17:52:28.310256+00:00 smithi064 kernel: [ 1591.733832] libceph: mon0 (1)172.21.15.64:6789 session established
2023-12-28T17:52:28.372749+00:00 smithi064 kernel: [ 1591.794791] libceph: client4524 fsid 19544f59-b57e-4062-87f4-e2575e5a1849
2023-12-28T18:29:27.781034+00:00 smithi064 kernel: [ 3811.243063] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 hung
2023-12-28T18:29:35.949034+00:00 smithi064 kernel: [ 3819.411222] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 caps stale
2023-12-28T18:31:07.089073+00:00 smithi064 kernel: [ 3910.553822] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 came back
2023-12-28T18:31:31.256995+00:00 smithi064 kernel: [ 3934.722305] ceph: [19544f59-b57e-4062-87f4-e2575e5a1849 4524]: mds0 caps renewed
2023-12-28T19:07:13.329289+00:00 smithi064 kernel: [ 6076.841204] ceph-brx: port 1(brx.0) entered disabled state
2023-12-28T19:07:13.333011+00:00 smithi064 kernel: [ 6076.844775] brx.0 (unregistering): left allmulticast mode
2023-12-28T19:07:13.333026+00:00 smithi064 kernel: [ 6076.844789] brx.0 (unregistering): left promiscuous mode
2023-12-28T19:07:13.333029+00:00 smithi064 kernel: [ 6076.844799] ceph-brx: port 1(brx.0) entered disabled state

While during those 300s the kclient kept sending client request frequently.

And suddenly 300s later the kclient sent 5 renew caps requests in 30s, while usually the delayed_work should send a renew cap request per 75s:

2023-12-28T18:24:26.990+0000 14481700  1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 40455 ==== client_session(request_renewcaps seq 25) v7 ==== 175+0+0 (unknown 1820982459 0 0) 0x1f9e6e90 con 0x1c5b5a60
2023-12-28T18:31:07.050+0000 14481700  1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 49400 ==== client_session(request_renewcaps seq 26) v7 ==== 175+0+0 (unknown 1600306678 0 0) 0x1f972790 con 0x1c5b5a60
2023-12-28T18:31:07.362+0000 14481700  1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 49491 ==== client_session(request_renewcaps seq 27) v7 ==== 175+0+0 (unknown 1075391854 0 0) 0x2298a410 con 0x1c5b5a60
2023-12-28T18:31:07.378+0000 14481700  1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 49572 ==== client_session(request_renewcaps seq 28) v7 ==== 175+0+0 (unknown 2804469824 0 0) 0x2155edd0 con 0x1c5b5a60
2023-12-28T18:31:31.006+0000 14481700  1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 49742 ==== client_session(request_renewcaps seq 29) v7 ==== 175+0+0 (unknown 3092463832 0 0) 0x2246a2d0 con 0x1c5b5a60
2023-12-28T18:31:31.210+0000 14481700  1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 49822 ==== client_session(request_renewcaps seq 30) v7 ==== 175+0+0 (unknown 2816952903 0 0) 0x214025d0 con 0x1c5b5a60
2023-12-28T18:32:10.097+0000 14481700  1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 50381 ==== client_session(request_renewcaps seq 31) v7 ==== 175+0+0 (unknown 769413265 0 0) 0xe9994c0 con 0x1c5b5a60
2023-12-28T18:33:25.120+0000 14481700  1 -- [v2:172.21.15.64:6836/3201717286,v1:172.21.15.64:6837/3201717286] <== client.4524 v1:192.168.0.1:0/1895760544 51657 ==== client_session(request_renewcaps seq 32) v7 ==== 175+0+0 (unknown 927630548 0 0) 0xe6ae3d0 con 0x1c5b5a60

It seems the renew caps requests were stuck somewhere.

Actions #21

Updated by Xiubo Li 5 months ago

Xiubo Li wrote:

In kclient when releasing the requests it will skip checking the dirop caps, which means in case there is any caps of 'Fsxr' under revoking, it will skip releasing the caps too in time. And only when the next time the kclient will do the check caps will it be possible to release the caps.

I sent out one kernel patch to fix it https://patchwork.kernel.org/project/ceph-devel/list/?series=814262.

But I am still haven't figured out why the delayed_work thread was stuck more than 300s and didn't send any renew caps request during that, which will make the corresponding session in the HUNG state:

[...]

While during those 300s the kclient kept sending client request frequently.

And suddenly 300s later the kclient sent 5 renew caps requests in 30s, while usually the delayed_work should send a renew cap request per 75s:

[...]

It seems the renew caps requests were stuck somewhere.

During the 300 seconds there were around 8000 client_caps update requests from the kclient and there was no any normal other normal client requests except the client_metric update ones:

$ grep 'client_caps(update ino ' b.log | wc -l
8176

While in kclient side the renewcaps and the delayed client_caps update requests will be handled in the same work thread, and the client_metric will be handled in a dedicated work thread. Since we can see client_metric requests periodically during the 8176 client_caps update requests, it means that in ceph mds side the finisher queue shouldn't be to large, which could make the renewcaps requests deferred. The stuck should be in kcleint side in the work thread itself.

For example when there are too many delayed client_caps update requests in the list, then in the work thread it may stuck some seconds to handle them one by one:

5099 static void delayed_work(struct work_struct *work)
5100 {
5101         struct ceph_mds_client *mdsc =
5102                 container_of(work, struct ceph_mds_client, delayed_work.work);
5103         unsigned long delay;
5104         int renew_interval;
5105         int renew_caps;
5106         int i;
5107   
5108         dout("mdsc delayed_work\n");
5109   
5110         if (mdsc->stopping >= CEPH_MDSC_STOPPING_FLUSHED)
5111                 return;
5112   
5113         mutex_lock(&mdsc->mutex);
5114         renew_interval = mdsc->mdsmap->m_session_timeout >> 2;
5115         renew_caps = time_after_eq(jiffies, HZ*renew_interval +
5116                                    mdsc->last_renew_caps);
5117         if (renew_caps)
5118                 mdsc->last_renew_caps = jiffies;    
5119 
5120         for (i = 0; i < mdsc->max_sessions; i++) {
5121                 struct ceph_mds_session *s = __ceph_lookup_mds_session(mdsc, i);
5122                 if (!s)
5123                         continue;
5124 
5125                 if (!check_session_state(s)) {
5126                         ceph_put_mds_session(s);
5127                         continue;
5128                 }
5129                 mutex_unlock(&mdsc->mutex);
5130 
5131                 mutex_lock(&s->s_mutex);
5132                 if (renew_caps)
5133                         send_renew_caps(mdsc, s);                                 ---> renew caps
5134                 else
5135                         ceph_con_keepalive(&s->s_con);
5136                 if (s->s_state == CEPH_MDS_SESSION_OPEN ||
5137                     s->s_state == CEPH_MDS_SESSION_HUNG)
5138                         ceph_send_cap_releases(mdsc, s);
5139                 mutex_unlock(&s->s_mutex);
5140                 ceph_put_mds_session(s);                                                                                                                                                                                          
5141 
5142                 mutex_lock(&mdsc->mutex);
5143         }
5144         mutex_unlock(&mdsc->mutex);
5145 
5146         delay = ceph_check_delayed_caps(mdsc);                         ----> handle the delayed client_caps updates one by one and stuck here for seconds
5147 
5148         ceph_queue_cap_reclaim_work(mdsc);
5149 
5150         ceph_trim_snapid_map(mdsc);
5151 
5152         maybe_recover_session(mdsc);
5153 
5154         schedule_delayed(mdsc, delay);
5155 }
Actions #23

Updated by Xiubo Li 2 months ago

In upstream we have fixed it since v6.8-rc5.

Actions

Also available in: Atom PDF