Project

General

Profile

Bug #22428

mds: don't report slow request for blocked filelock request

Added by Zheng Yan over 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Related issues

Copied to CephFS - Backport #23989: luminous: mds: don't report slow request for blocked filelock request Resolved

History

#1 Updated by Patrick Donnelly over 6 years ago

  • Assignee set to Zheng Yan

Perhaps reclassify the slow requests blocked by locks as "clients not releasing file locks" or similar to be different from "slow requestS".

#2 Updated by John Spray about 6 years ago

Here's a recent example from someone in #ceph:

2018-01-16 17:47:05.910681 mon.0 [INF] mon.5 192.168.193.73:6789/0
2018-01-16 17:49:14.095764 mon.0 [WRN] MDS health message (mds.1): 1 slow requests are blocked > 30 sec
2018-01-16 17:49:14.095812 mon.0 [INF] daemon mds.manager02 is now active in filesystem cephfs as rank 1
2018-01-16 17:49:14.276275 mon.0 [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)
2018-01-16 17:49:10.248891 mds.1 [WRN] 1 slow requests, 1 included below; oldest blocked for > 31.623102 secs
2018-01-16 17:49:10.248906 mds.1 [WRN] slow request 31.623102 seconds old, received at 2018-01-16 17:48:38.625681: client_request(client.434194:1157564 setfilelockrule 2, type 2, owner 10101539253148314382, pid 14917, start 0, length 0, wait 1 #0x20000001809 2018-01-16 17:48:38.513624 caller_uid=101, caller_gid=102{}) currently acquired locks
2018-01-16 17:49:40.249814 mds.1 [WRN] 1 slow requests, 1 included below; oldest blocked for > 61.624025 secs
2018-01-16 17:49:40.249825 mds.1 [WRN] slow request 61.624025 seconds old, received at 2018-01-16 17:48:38.625681: client_request(client.434194:1157564 setfilelockrule 2, type 2, owner 10101539253148314382, pid 14917, start 0, length 0, wait 1 #0x20000001809 2018-01-16 17:48:38.513624 caller_uid=101, caller_gid=102{}) currently acquired locks
2018-01-16 17:49:58.098323 mon.0 [INF] MDS health message cleared (mds.1): 1 slow requests are blocked > 30 sec
2018-01-16 17:49:58.098366 mon.0 [INF] daemon mds.manager02 is now active in filesystem cephfs as rank 1
2018-01-16 17:49:58.336035 mon.0 [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests)
2018-01-16 17:49:58.336158 mon.0 [INF] Cluster is now healthy

#3 Updated by Dan van der Ster almost 6 years ago

In case you need more examples, we're seeing this recently on 12.2.4:

2018-04-25 10:23:29.964959 mds.cephdwightmds1 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.144366 secs
2018-04-25 10:23:29.964964 mds.cephdwightmds1 [WRN] slow request 30.144366 seconds old, received at 2018-04-25 10:22:59.820554: client_request(client.575739618:75133749 setfilelockrule 1, type 2, owner 17575622701090626199, pid 27476, start 0, length 0, wait 1 #0x2000309a73f 2018-04-25 10:22:59.819251 caller_uid=0, caller_gid=0{}) currently acquired locks
2018-04-25 10:23:30.965128 mds.cephdwightmds1 [WRN] 3 slow requests, 2 included below; oldest blocked for > 31.144517 secs
2018-04-25 10:23:30.965133 mds.cephdwightmds1 [WRN] slow request 30.891629 seconds old, received at 2018-04-25 10:23:00.073442: client_request(client.575739618:75134492 setfilelockrule 1, type 2, owner 11650441161165249951, pid 9990, start 0, length 0, wait 1 #0x2000309a73f 2018-04-25 10:23:00.072054 caller_uid=0, caller_gid=0{}) currently acquired locks
2018-04-25 10:23:30.965136 mds.cephdwightmds1 [WRN] slow request 30.829593 seconds old, received at 2018-04-25 10:23:00.135478: client_request(client.575739618:75134657 setfilelockrule 1, type 2, owner 11876786896114809271, pid 39172, start 0, length 0, wait 1 #0x2000309a73f 2018-04-25 10:23:00.134047 caller_uid=0, caller_gid=0{}) currently acquired locks
2018-04-25 10:23:33.396905 mon.cephdwight-mon-1633994557 [WRN] MDS health message (mds.1): 3 slow requests are blocked > 30 sec
2018-04-25 10:23:33.752354 mon.cephdwight-mon-1633994557 [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)
2018-04-25 10:23:37.396863 mon.cephdwight-mon-1633994557 [INF] MDS health message cleared (mds.1): 3 slow requests are blocked > 30 sec
2018-04-25 10:23:37.805466 mon.cephdwight-mon-1633994557 [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests)
2018-04-25 10:23:37.805522 mon.cephdwight-mon-1633994557 [INF] Cluster is now healthy

#4 Updated by Patrick Donnelly almost 6 years ago

  • Category set to Administration/Usability
  • Priority changed from Normal to High
  • Target version set to v13.0.0
  • Source set to other
  • Backport set to luminous
  • Component(FS) MDS, MDSMonitor added

#5 Updated by Zheng Yan almost 6 years ago

  • Status changed from New to Fix Under Review

#6 Updated by Patrick Donnelly almost 6 years ago

  • Status changed from Fix Under Review to Pending Backport

#7 Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #23989: luminous: mds: don't report slow request for blocked filelock request added

#8 Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF