Bug #22428
mds: don't report slow request for blocked filelock request
Status:
Resolved
Priority:
High
Assignee:
Category:
Administration/Usability
Target version:
% Done:
0%
Source:
other
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Related issues
History
#1 Updated by Patrick Donnelly over 6 years ago
- Assignee set to Zheng Yan
Perhaps reclassify the slow requests blocked by locks as "clients not releasing file locks" or similar to be different from "slow requestS".
#2 Updated by John Spray about 6 years ago
Here's a recent example from someone in #ceph:
2018-01-16 17:47:05.910681 mon.0 [INF] mon.5 192.168.193.73:6789/0 2018-01-16 17:49:14.095764 mon.0 [WRN] MDS health message (mds.1): 1 slow requests are blocked > 30 sec 2018-01-16 17:49:14.095812 mon.0 [INF] daemon mds.manager02 is now active in filesystem cephfs as rank 1 2018-01-16 17:49:14.276275 mon.0 [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST) 2018-01-16 17:49:10.248891 mds.1 [WRN] 1 slow requests, 1 included below; oldest blocked for > 31.623102 secs 2018-01-16 17:49:10.248906 mds.1 [WRN] slow request 31.623102 seconds old, received at 2018-01-16 17:48:38.625681: client_request(client.434194:1157564 setfilelockrule 2, type 2, owner 10101539253148314382, pid 14917, start 0, length 0, wait 1 #0x20000001809 2018-01-16 17:48:38.513624 caller_uid=101, caller_gid=102{}) currently acquired locks 2018-01-16 17:49:40.249814 mds.1 [WRN] 1 slow requests, 1 included below; oldest blocked for > 61.624025 secs 2018-01-16 17:49:40.249825 mds.1 [WRN] slow request 61.624025 seconds old, received at 2018-01-16 17:48:38.625681: client_request(client.434194:1157564 setfilelockrule 2, type 2, owner 10101539253148314382, pid 14917, start 0, length 0, wait 1 #0x20000001809 2018-01-16 17:48:38.513624 caller_uid=101, caller_gid=102{}) currently acquired locks 2018-01-16 17:49:58.098323 mon.0 [INF] MDS health message cleared (mds.1): 1 slow requests are blocked > 30 sec 2018-01-16 17:49:58.098366 mon.0 [INF] daemon mds.manager02 is now active in filesystem cephfs as rank 1 2018-01-16 17:49:58.336035 mon.0 [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) 2018-01-16 17:49:58.336158 mon.0 [INF] Cluster is now healthy
#3 Updated by Dan van der Ster almost 6 years ago
In case you need more examples, we're seeing this recently on 12.2.4:
2018-04-25 10:23:29.964959 mds.cephdwightmds1 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.144366 secs 2018-04-25 10:23:29.964964 mds.cephdwightmds1 [WRN] slow request 30.144366 seconds old, received at 2018-04-25 10:22:59.820554: client_request(client.575739618:75133749 setfilelockrule 1, type 2, owner 17575622701090626199, pid 27476, start 0, length 0, wait 1 #0x2000309a73f 2018-04-25 10:22:59.819251 caller_uid=0, caller_gid=0{}) currently acquired locks 2018-04-25 10:23:30.965128 mds.cephdwightmds1 [WRN] 3 slow requests, 2 included below; oldest blocked for > 31.144517 secs 2018-04-25 10:23:30.965133 mds.cephdwightmds1 [WRN] slow request 30.891629 seconds old, received at 2018-04-25 10:23:00.073442: client_request(client.575739618:75134492 setfilelockrule 1, type 2, owner 11650441161165249951, pid 9990, start 0, length 0, wait 1 #0x2000309a73f 2018-04-25 10:23:00.072054 caller_uid=0, caller_gid=0{}) currently acquired locks 2018-04-25 10:23:30.965136 mds.cephdwightmds1 [WRN] slow request 30.829593 seconds old, received at 2018-04-25 10:23:00.135478: client_request(client.575739618:75134657 setfilelockrule 1, type 2, owner 11876786896114809271, pid 39172, start 0, length 0, wait 1 #0x2000309a73f 2018-04-25 10:23:00.134047 caller_uid=0, caller_gid=0{}) currently acquired locks 2018-04-25 10:23:33.396905 mon.cephdwight-mon-1633994557 [WRN] MDS health message (mds.1): 3 slow requests are blocked > 30 sec 2018-04-25 10:23:33.752354 mon.cephdwight-mon-1633994557 [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST) 2018-04-25 10:23:37.396863 mon.cephdwight-mon-1633994557 [INF] MDS health message cleared (mds.1): 3 slow requests are blocked > 30 sec 2018-04-25 10:23:37.805466 mon.cephdwight-mon-1633994557 [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) 2018-04-25 10:23:37.805522 mon.cephdwight-mon-1633994557 [INF] Cluster is now healthy
#4 Updated by Patrick Donnelly almost 6 years ago
- Category set to Administration/Usability
- Priority changed from Normal to High
- Target version set to v13.0.0
- Source set to other
- Backport set to luminous
- Component(FS) MDS, MDSMonitor added
#5 Updated by Zheng Yan almost 6 years ago
- Status changed from New to Fix Under Review
#6 Updated by Patrick Donnelly almost 6 years ago
- Status changed from Fix Under Review to Pending Backport
#7 Updated by Nathan Cutler almost 6 years ago
- Copied to Backport #23989: luminous: mds: don't report slow request for blocked filelock request added
#8 Updated by Nathan Cutler over 5 years ago
- Status changed from Pending Backport to Resolved