Project

General

Profile

Bug #46905

client: cluster [WRN] evicting unresponsive client smithi122:0 (34373), after 304.762 seconds

Added by Xiubo Li 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
libcephfs
Labels (FS):
Pull request ID:
Crash signature:

Description

From https://pulpito.ceph.com/pdonnell-2020-08-08_02\:21\:01-multimds-wip-pdonnell-testing-20200808.001303-distro-basic-smithi/5319314/teuthology.log

"2020-08-12T01:33:01.526130+0000 mds.a (mds.0) 1 : cluster [WRN] evicting unresponsive client smithi122:0 (34373), after 304.762 seconds" in cluster log

From the code in _unmount() the inode ref will be increased by

anchor.emplace_back(in);
_flush(in, new C_Client_FlushComplete(this, in));

But from https://pulpito.ceph.com/pdonnell-2020-08-08_02\:21\:01-multimds-wip-pdonnell-testing-20200808.001303-distro-basic-smithi/5319314/smithi157/log/ceph-client.0.29855.log.gz, we can see that the ref only decreased once before the client was blacklisted by MDS, and during this time the related osd op reply keeps going on:

2020-08-10T15:22:28.081+0000 7f54882511c0 15 inode.get on 0x7f545c062670 0x10000000352.head now 4
2020-08-10T15:22:28.081+0000 7f54882511c0 20 client.4593 _release 0x10000000352.head(faked_ino=0 ref=4 ll_ref=0 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0,3=0} mode=100700 size=104857600/213909504 nlink=0 btime=0.000000 mtime=2020-08-10T15:16:36.423961+0000 ctime=2020-08-10T15:22:08.870067+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000352 ts 0/0 objects 25 dirty_or_tx 1347584] 0x7f545c062670)
2020-08-10T15:22:28.081+0000 7f54882511c0 15 inode.get on 0x7f545c062670 0x10000000352.head now 5
2020-08-10T15:22:28.081+0000 7f54882511c0 10 client.4593 _flush 0x10000000352.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0,3=0} mode=100700 size=104857600/213909504 nlink=0 btime=0.000000 mtime=2020-08-10T15:16:36.423961+0000 ctime=2020-08-10T15:22:08.870067+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000352 ts 0/0 objects 25 dirty_or_tx 1347584] 0x7f545c062670)
...
096=0,8192=1} open={2=0,3=0} mode=100700 size=104857600/213909504 nlink=0 btime=0.000000 mtime=2020-08-10T15:16:36.423961+0000 ctime=2020-08-10T15:22:08.870067+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000352 ts 0/0 objects 25 dirty_or_tx 1347584] 0x7f545c062670)
2020-08-10T15:22:28.087+0000 7f54882511c0 15 inode.put on 0x7f545c062670 0x10000000352.head now 4

...

2020-08-10T15:26:08.112+0000 7f54882511c0  1 client.4593 dump_cache
2020-08-10T15:26:08.112+0000 7f54882511c0  1 client.4593 dump_inode: DISCONNECTED inode 0x10000000352 #0x10000000352 ref 40x10000000352.head(faked_ino=0 ref=4 ll_ref=0 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0,3=0} mode=100700 size=104857600/213909504 nlink=0 btime=0.000000 mtime=2020-08-10T15:16:36.423961+0000 ctime=2020-08-10T15:22:08.870067+0000 caps=-(0=pAsLsXsFsxcrwb) objectset[0x10000000352 ts 0/0 objects 25 dirty_or_tx 24576] 0x7f545c062670)

History

#1 Updated by Xiubo Li 2 months ago

  • Status changed from New to In Progress

#2 Updated by Xiubo Li 2 months ago

  • Pull request ID set to 36574

#3 Updated by Xiubo Li 2 months ago

  • Status changed from In Progress to Fix Under Review

Let's just defer cancelling the event timer.

#4 Updated by Patrick Donnelly 2 months ago

  • Target version set to v16.0.0
  • Source set to Q/A

#5 Updated by Xiubo Li 2 months ago

  • Priority changed from Normal to High

#6 Updated by Patrick Donnelly about 2 months ago

  • Status changed from Fix Under Review to Resolved
  • Pull request ID changed from 36574 to 36580

Also available in: Atom PDF