Project

General

Profile

Actions

Bug #64977

open

mds spinlock due to lock contention leading to memory exaustion

Added by Enrico Bocchi about 1 month ago. Updated 15 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have started to observe an MDS spinlock issue on two CephFS clusters running 16.2.9.

On both clusters, the offending pattern generates from backup jobs (done via restic).
The backup clients seem to trigger a stuck operation on the MDS side, which manifests as "failed to rdlock".
The MDS memory balloons (all allocated into buffer_anon) at the rate of ~1GB per second, till OOM.

The stuck operation manifests as the MDS heavily looping on:

2024-03-15T22:27:24.125+0100 7f062bcf7700  7 mds.0.locker    sending MClientCaps to client.372200149 seq 103681947 new pending pAsxLsXsxFsxcrwb was pAsLsXsxFsxcrwb
2024-03-15T22:27:24.125+0100 7f062bcf7700  7 mds.0.server dispatch_client_request client_request(client.372777796:588436 lookup #0x1012524ee02/414769741_1-3f4132de_320.tmp 2024-03-15T21:53:54.681734+0100 caller_uid=0, caller_gid=0{0,}) v4
...
2024-03-15T22:27:24.125+0100 7f062bcf7700  7 mds.0.locker    sending MClientCaps to client.372200149 seq 103681948 new pending pAsLsXsxFsxcrwb was pAsxLsXsxFsxcrwb
(edited)

client.372200149 is the legit client, serving the application.
client.372777796 is the backup client.

During the heavy loop triggered by lock contention, the MDS struggles to serve other requests, log segments are not trimmed, and the socket may becomes unresponsive.

Evicting the backup client failing to acquire lock releases pressure on the MDS, and the cluster goes back to HEALTH_OK. If the backup client comes back (we do not blocklist on eviction), then history repeats.
Evicting all clients with a lock on the file seems to resolve the situation.

The issue seems to have started with an upgrade of clients to 6.5.5-200.fc38.


Files

cap_revoke.png (165 KB) cap_revoke.png Abhishek Lekshmanan, 04/08/2024 03:35 PM
Actions #1

Updated by Enrico Bocchi about 1 month ago

Looking further into this reveled that:
- All the clients triggering the issue are kernel clients running 6.5.5-200.fc38.
- We have never seen this happening with the previous kernel client -- 6.4.15.

We diff'ed the kernel sources.
There appear to be changes in
- caps.c, brought in by https://tracker.ceph.com/issues/61332
- dir.c also changed with https://tracker.ceph.com/issues/61584, which we believe causes even more contention due to mtime propagation to parent directory (kernel PR: https://github.com/ceph/ceph-client/commit/d9d00f71ab5a2b5a47b228f678a8817e8687387f)

The new kernel client also incorporates a change in VFS: https://github.com/ceph/ceph-client/commit/3e3271549670783be20e233a2b78a87a0b04c715:

WRAP_DIR_ITER(ceph_readdir) // FIXME!
const struct file_operations ceph_dir_fops = {
    .read = ceph_read_dir,
    .iterate = ceph_readdir,
    .iterate_shared = shared_ceph_readdir,
    .llseek = ceph_dir_llseek,
    .open = ceph_open,
    .release = ceph_release,

Could the MDS spin-lock issue be related to any of the above?

Actions #2

Updated by Enrico Bocchi about 1 month ago

There's also a potential cap sequence mismatch issue tracked at https://tracker.ceph.com/issues/61782
The fix for that did not make it to the kernel client currently in use.

Actions #3

Updated by Enrico Bocchi about 1 month ago

Found a thread in list.ceph.io [1] reporting the problem we see following an upgrade to 6.5 kernel client.
MDS allocates memory and uses a lot of bandwidth with the offending clients increases (likely due to messenger?)

From the description in the kernel:

>  * Concurrency-managed per-cpu work items that hog CPU for longer than
>  * wq_cpu_intensive_thresh_us trigger the automatic CPU_INTENSIVE mechanism,
>  * which prevents them from stalling other concurrency-managed work items. If a
>  * work function keeps triggering this mechanism, it's likely that the work item
>  * should be using an unbound workqueue instead.

Is this slowing down the process of acquiring/returning caps to the MDS?

[1]:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YR5UNKBOKDHPL2PV4J75ZIUNI4HNMC2W/#YOULTM4S32QG7MMVOQ336XVLS5SU4AA3

Actions #4

Updated by Xiubo Li about 1 month ago

Hi Enrico,

Do you have more mds side logs ?
Thanks.

Actions #5

Updated by Xiubo Li about 1 month ago

Enrico Bocchi wrote:

We have started to observe an MDS spinlock issue on two CephFS clusters running 16.2.9.

On both clusters, the offending pattern generates from backup jobs (done via restic).
The backup clients seem to trigger a stuck operation on the MDS side, which manifests as "failed to rdlock".
The MDS memory balloons (all allocated into buffer_anon) at the rate of ~1GB per second, till OOM.

The stuck operation manifests as the MDS heavily looping on:
[...]

client.372200149 is the legit client, serving the application.
client.372777796 is the backup client.

During the heavy loop triggered by lock contention, the MDS struggles to serve other requests, log segments are not trimmed, and the socket may becomes unresponsive.

Evicting the backup client failing to acquire lock releases pressure on the MDS, and the cluster goes back to HEALTH_OK. If the backup client comes back (we do not blocklist on eviction), then history repeats.
Evicting all clients with a lock on the file seems to resolve the situation.

The issue seems to have started with an upgrade of clients to 6.5.5-200.fc38.

2024-03-15T22:27:24.125+0100 7f062bcf7700  7 mds.0.locker    sending MClientCaps to client.372200149 seq 103681947 new pending pAsxLsXsxFsxcrwb was pAsLsXsxFsxcrwb
2024-03-15T22:27:24.125+0100 7f062bcf7700  7 mds.0.server dispatch_client_request client_request(client.372777796:588436 lookup #0x1012524ee02/414769741_1-3f4132de_320.tmp 2024-03-15T21:53:54.681734+0100 caller_uid=0, caller_gid=0{0,}) v4
...
2024-03-15T22:27:24.125+0100 7f062bcf7700  7 mds.0.locker    sending MClientCaps to client.372200149 seq 103681948 new pending pAsLsXsxFsxcrwb was pAsxLsXsxFsxcrwb
(edited)

From the above logs we can see that the client.372200149 just being issued the Ax, but later another client.372777796 lookup request came and it may cause the Ax cap's revocation, and then just in a dead loop for some reasons.

Please provide more mds debug logs with the debug_mds = 25 and debug_ms = 1 enabled.
And if possible please also provide the kclient debug logs.

Thanks
- Xiubo

Actions #6

Updated by Xiubo Li about 1 month ago

  • Project changed from Ceph to CephFS
  • Assignee set to Xiubo Li
Actions #7

Updated by Xiubo Li about 1 month ago

There have two cases will send the lookup request and in the dentry revalidation one it may retry it infinitely, I am still going through the code, but with the logs it will be much easier to found the root cause.

Actions #8

Updated by Xiubo Li about 1 month ago

  • Status changed from New to Need More Info
Actions #9

Updated by Abhishek Lekshmanan 25 days ago ยท Edited

We see this pattern during the incidents that both cap revocation and grants spike and then we see the node consuming memory before getting oom-killed. (see attached image)

On the client side there are usually 2 clients involved - one restic/ backup which is mounted ro, and usually an active client for eg a build job. We see in the debug logs a pattern like the following

2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.server dispatch_client_request client_request(client.379194623:32785 lookup #0x1003f0c6738/cgisess_024c841a02411969569b80c0672798c9 2024-04-04T17:49:38.16504
4+0200 caller_uid=0, caller_gid=0{0,}) v4
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.server rdlock_path_pin_ref request(client.379194623:32785 nref=3 cr=0x563c4779bb80) #0x1003f0c6738/cgisess_024c841a02411969569b80c0672798c9
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.locker acquire_locks request(client.379194623:32785 nref=3 cr=0x563c4779bb80)
2024-04-04T17:50:56.825+0200 7f16679d3700 20 mds.0.locker  must rdlock (iauth excl) [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a024
11969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFs
xcrwb/pAsxXsxFsxcrwb@289985},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700 20 mds.0.locker  must rdlock (ixattr excl) [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02
411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxF
sxcrwb/pAsxXsxFsxcrwb@289985},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.locker  must authpin [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c
0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFsxcrwb/pAsxXs
xFsxcrwb@289985},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.locker  already auth_pinned [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969
569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFsxcrwb
/pAsxXsxFsxcrwb@289985},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker rdlock_start  on (iauth excl) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c
841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsx
LsXsxFsxcrwb/pAsxXsxFsxcrwb@289985},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker simple_sync on (iauth excl) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c84
1a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLs
XsxFsxcrwb/pAsxXsxFsxcrwb@289985},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker issue_caps loner client.372831873 allowed=pAsLsXsxFsxcrwb, xlocker allowed=pAsLsXsxFsxcrwb, others allowed=pLs on [inode 0x1012cb88597 [2,head] /volum
es/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (if
ile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289985},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 auth
pin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700 20 mds.0.locker  client.372831873 pending pAsxLsXsxFsxcrwb allowed pAsLsXsxFsxcrwb wanted pAsxXsxFsxcrwb
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker    sending MClientCaps to client.372831873 seq 289986 new pending pAsLsXsxFsxcrwb was pAsxLsXsxFsxcrwb
2024-04-04T17:50:56.825+0200 7f16679d3700 20 mds.0.cache.ino(0x1012cb88597) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 ctime 2024-04-03T15:10:05.039419+0200
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.27169 send_message_client_counted client.372831873 seq 738657 client_caps(revoke ino 0x1012cb88597 1 seq 289986 caps=pAsLsXsxFsxcrwb dirty=- wanted=pAsxXsxFs
xcrwb follows 0 size 177/4194304 ts 1/18446744073709551615 mtime 2024-04-03T15:10:05.039419+0200 tws 1) v11

Next this continues as a grant sequence

2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.cache.ino(0x1012cb88597) auth_pin by 0x563ac163c390 on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/t
mp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1}
 caps={372831873=pAsLsXsxFsxcrwb/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100] now 2
2024-04-04T17:50:56.825+0200 7f16679d3700 15 mds.0.cache.dir(0x1003f0c6738) adjust_nested_auth_pins 1 on [dir 0x1003f0c6738 /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/ [2,head]
 auth v=4446576 cv=4446576/4446576 ap=0+3 state=1074003969|complete f(v0 m2024-04-04T17:23:55.068160+0200 4310=4310+0) n(v42 rc2024-04-04T17:47:12.708372+0200 b1115214 4310=4310+0) hs=4310+0,ss=0+0 | child=1
dirty=0 waiter=0 authpin=0 0x563645696000] by 0x563ac163c100 count now 0/3
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker rdlock_start waiting on (iauth excl->sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.cache.ino(0x1012cb88597) add_waiter tag 2000000000040000 0x56367bc33160 !ambig 1 !frozen 1 !freezing 1
2024-04-04T17:50:56.825+0200 7f16679d3700 15 mds.0.cache.ino(0x1012cb88597) taking waiter here
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.locker nudge_log (iauth excl->sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker _do_cap_release for client.372831873 on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker  issue_seq 145038 != 1
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.locker eval_gather (iauth excl->sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024
=pAsLsXsxFsxcrwb/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker _do_cap_release for client.372831873 on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgise
ss_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={3
72831873=pAsLsXsxFsxcrwb/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker  issue_seq 145038 != 1
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.locker eval_gather (iauth excl->sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024
c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={3728318
73=pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.locker  next state is sync issued/allows loner s/s xlocker /s other /s
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker eval_gather finished gather on (iauth excl->sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/work
ing/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-41943
04@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.cache.ino(0x1012cb88597) auth_unpin by 0x563ac163c390 on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working
/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=
pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100] now 1
2024-04-04T17:50:56.825+0200 7f16679d3700 15 mds.0.cache.dir(0x1003f0c6738) adjust_nested_auth_pins -1 on [dir 0x1003f0c6738 /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/ [2,head
] auth v=4446576 cv=4446576/4446576 ap=0+2 state=1074003969|complete f(v0 m2024-04-04T17:23:55.068160+0200 4310=4310+0) n(v42 rc2024-04-04T17:47:12.708372+0200 b1115214 4310=4310+0) hs=4310+0,ss=0+0 | child=1
 dirty=0 waiter=0 authpin=0 0x563645696000] by 0x563ac163c390 count now 0/2
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.locker simple_eval (iauth sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a0
2411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxXs
xFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker simple_eval stable, going to excl (iauth sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working
/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=
pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker simple_excl on (iauth sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c84
1a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAs
xXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker issue_caps loner client.372831873 allowed=pAsxLsXsxFsxcrwb, xlocker allowed=pAsxLsXsxFsxcrwb, others allowed=pLs on [inode 0x1012cb88597 [2,head] /vol
umes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile
excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289986},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1
 0x563ac163c100]
2024-04-04T17:50:56.825+0200 7f16679d3700 20 mds.0.locker  client.372831873 pending pAsLsXsxFsxcrwb allowed pAsxLsXsxFsxcrwb wanted pAsxXsxFsxcrwb
2024-04-04T17:50:56.825+0200 7f16679d3700  7 mds.0.locker    sending MClientCaps to client.372831873 seq 289987 new pending pAsxLsXsxFsxcrwb was pAsLsXsxFsxcrwb
2024-04-04T17:50:56.825+0200 7f16679d3700 20 mds.0.cache.ino(0x1012cb88597) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 ctime 2024-04-03T15:10:05.039419+0200
2024-04-04T17:50:56.825+0200 7f16679d3700 10 mds.0.27169 send_message_client_counted client.372831873 seq 738658 client_caps(grant ino 0x1012cb88597 1 seq 289987 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFs
xcrwb follows 0 size 177/4194304 ts 1/18446744073709551615 mtime 2024-04-03T15:10:05.039419+0200 tws 1) v11
2024-04-04T17:50:56.825+0200 7f16679d3700 10 MDSContext::complete: 18C_MDS_RetryRequest

The client in rw mode is running 6.5.5 kernel, which has https://github.com/ceph/ceph-client/commit/ce72d4e0f179340cece90d5b826eb63bbf9fefc0 tracker https://tracker.ceph.com/issues/61332
However the server doesn't have https://github.com/ceph/ceph/pull/51500 and also this tracker with mismatched sequence is not patched in the client https://tracker.ceph.com/issues/61782

Actions #10

Updated by Abhishek Lekshmanan 24 days ago

Posted more logs at fed9e44e-a0ec-4692-ae23-6a1047fe9247

Actions #11

Updated by Xiubo Li 23 days ago

The client.379194623:32785 lookup request was spinning infinitely in MDS:

2024-04-04T17:50:56.835+0200 7f16679d3700  7 mds.0.server dispatch_client_request client_request(client.379194623:32785 lookup #0x1003f0c6738/cgisess_024c841a02411969569b80c0672798c9 2024-04-04T17:49:38.165044+0200 caller_uid=0, caller_gid=0{0,}) v4                                                                                                                                                                                                                     
2024-04-04T17:50:56.835+0200 7f16679d3700 10 mds.0.server rdlock_path_pin_ref request(client.379194623:32785 nref=3 cr=0x563c4779bb80) #0x1003f0c6738/cgisess_024c841a02411969569b80c0672798c9
2024-04-04T17:50:56.835+0200 7f16679d3700 10 mds.0.locker acquire_locks request(client.379194623:32785 nref=3 cr=0x563c4779bb80)
2024-04-04T17:50:56.835+0200 7f16679d3700 20 mds.0.locker  must rdlock (iauth excl) [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289995},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.835+0200 7f16679d3700 20 mds.0.locker  must rdlock (ixattr excl) [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289995},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.835+0200 7f16679d3700 10 mds.0.locker  must authpin [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289995},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.835+0200 7f16679d3700 10 mds.0.locker  already auth_pinned [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289995},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.835+0200 7f16679d3700  7 mds.0.locker rdlock_start  on (iauth excl) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289995},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.835+0200 7f16679d3700  7 mds.0.locker simple_sync on (iauth excl) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289995},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.835+0200 7f16679d3700  7 mds.0.locker issue_caps loner client.372831873 allowed=pAsLsXsxFsxcrwb, xlocker allowed=pAsLsXsxFsxcrwb, others allowed=pLs on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289995},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.835+0200 7f16679d3700 20 mds.0.locker  client.372831873 pending pAsxLsXsxFsxcrwb allowed pAsLsXsxFsxcrwb wanted pAsxXsxFsxcrwb
2024-04-04T17:50:56.835+0200 7f16679d3700  7 mds.0.locker    sending MClientCaps to client.372831873 seq 289996 new pending pAsLsXsxFsxcrwb was pAsxLsXsxFsxcrwb
2024-04-04T17:50:56.835+0200 7f16679d3700 10 mds.0.27169 send_message_client_counted client.372831873 seq 738667 client_caps(revoke ino 0x1012cb88597 1 seq 289996 caps=pAsLsXsxFsxcrwb dirty=- wanted=pAsxXsxFsxcrwb follows 0 size 177/4194304 ts 1/18446744073709551615 mtime 2024-04-03T15:10:05.039419+0200 tws 1) v11
2024-04-04T17:50:56.835+0200 7f16679d3700 10 mds.0.cache.ino(0x1012cb88597) auth_pin by 0x563ac163c390 on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100] now 2
2024-04-04T17:50:56.835+0200 7f16679d3700 15 mds.0.cache.dir(0x1003f0c6738) adjust_nested_auth_pins 1 on [dir 0x1003f0c6738 /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/ [2,head] auth v=4446576 cv=4446576/4446576 ap=0+3 state=1074003969|complete f(v0 m2024-04-04T17:23:55.068160+0200 4310=4310+0) n(v42 rc2024-04-04T17:47:12.708372+0200 b1115214 4310=4310+0) hs=4310+0,ss=0+0 | child=1 dirty=0 waiter=0 authpin=0 0x563645696000] by 0x563ac163c100 count now 0/3
2024-04-04T17:50:56.835+0200 7f16679d3700  7 mds.0.locker rdlock_start waiting on (iauth excl->sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.835+0200 7f16679d3700 10 mds.0.locker nudge_log (iauth excl->sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100]

When trying to acquire the rdlock for authlock, it will issue a Ax caps revoke request to another client: client.372831873.

But later it seems received a cap release request from client, instead of receiving a revoke ack:

2024-04-04T17:50:56.842+0200 7f16679d3700  7 mds.0.locker _do_cap_release for client.372831873 on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.842+0200 7f16679d3700 10 mds.0.locker eval_gather (iauth excl->sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.842+0200 7f16679d3700  7 mds.0.locker eval_gather finished gather on (iauth excl->sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=2 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl->sync) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100]                                                                                                              
2024-04-04T17:50:56.842+0200 7f16679d3700 10 mds.0.cache.ino(0x1012cb88597) auth_unpin by 0x563ac163c390 on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=1 authpin=1 0x563ac163c100] now 1
2024-04-04T17:50:56.842+0200 7f16679d3700 15 mds.0.cache.dir(0x1003f0c6738) adjust_nested_auth_pins -1 on [dir 0x1003f0c6738 /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/ [2,head] auth v=4446576 cv=4446576/4446576 ap=0+2 state=1074003969|complete f(v0 m2024-04-04T17:23:55.068160+0200 4310=4310+0) n(v42 rc2024-04-04T17:47:12.708372+0200 b1115214 4310=4310+0) hs=4310+0,ss=0+0 | child=1 dirty=0 waiter=0 authpin=0 0x563645696000] by 0x563ac163c390 count now 0/2
2024-04-04T17:50:56.842+0200 7f16679d3700 10 mds.0.locker simple_eval (iauth sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.842+0200 7f16679d3700  7 mds.0.locker simple_eval stable, going to excl (iauth sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.842+0200 7f16679d3700  7 mds.0.locker simple_excl on (iauth sync) on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.842+0200 7f16679d3700  7 mds.0.locker issue_caps loner client.372831873 allowed=pAsxLsXsxFsxcrwb, xlocker allowed=pAsxLsXsxFsxcrwb, others allowed=pLs on [inode 0x1012cb88597 [2,head] /volumes/_nogroup/d7e4525a-d914-4061-8fd7-be14d533d013/foswiki/working/tmp/cgisess_024c841a02411969569b80c0672798c9 auth v4441767 ap=1 s=177 n(v0 rc2024-04-03T15:10:05.039419+0200 b177 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={372831873=0-4194304@1} caps={372831873=pAsLsXsxFsxcrwb/pAsxXsxFsxcrwb@289996},l=372831873 | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=1 0x563ac163c100]
2024-04-04T17:50:56.842+0200 7f16679d3700 20 mds.0.locker  client.372831873 pending pAsLsXsxFsxcrwb allowed pAsxLsXsxFsxcrwb wanted pAsxXsxFsxcrwb
2024-04-04T17:50:56.842+0200 7f16679d3700  7 mds.0.locker    sending MClientCaps to client.372831873 seq 289997 new pending pAsxLsXsxFsxcrwb was pAsLsXsxFsxcrwb
2024-04-04T17:50:56.842+0200 7f16679d3700 10 mds.0.27169 send_message_client_counted client.372831873 seq 738668 client_caps(grant ino 0x1012cb88597 1 seq 289997 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFsxcrwb follows 0 size 177/4194304 ts 1/18446744073709551615 mtime 2024-04-03T15:10:05.039419+0200 tws 1) v11
2024-04-04T17:50:56.842+0200 7f16679d3700  7 mds.0.server dispatch_client_request client_request(client.379194623:32785 lookup #0x1003f0c6738/cgisess_024c841a02411969569b80c0672798c9 2024-04-04T17:49:38.165044+0200 caller_uid=0, caller_gid=0{0,}) v4

This will happen when the corresponding cap has been removed by client. But in this case the MDS should remove the client cap and skipped issue new caps to the client.372831873 again, but not sure why the MDS did this. And this will issue the Ax back to the client.372831873. And then later when the client.379194623:32785 lookup request was woke up it will need to revoke the Ax caps again, then it was stuck in the dead loop infinitely.

The logs are not complete and a lot of the important logs are missed, so please upload the complete logs.

Thanks

Actions #12

Updated by Xiubo Li 22 days ago

  • Status changed from Need More Info to Fix Under Review
  • Pull request ID set to 56828
Actions #13

Updated by Abhishek Lekshmanan 15 days ago

We've uploaded a new set of logs with debug_ms 1 at 20d8ba67-8bb0-4cfc-a986-b72ec250728d

Actions

Also available in: Atom PDF