Project

General

Profile

Bug #22925

mds: LOCK_SYNC_MIX state makes "getattr" operations extremely slow when there are lots of clients issue writes or reads to the same file

Added by Xuehan Xu about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Recently, in our online Luminous Cephfs clusters, we found that there are occasionally lots of slow requests:

2018-01-16 15:16:53.233925 7fa59f7fe700 0 log_channel(cluster) log [WRN] : 49 slow requests, 5 included below; oldest blocked for > 89.995722 secs
2018-01-16 15:16:53.233934 7fa59f7fe700 0 log_channel(cluster) log [WRN] : slow request 39.966171 seconds old, received at 2018-01-16 15:16:13.267686: client_request(client.4925109:3759 getattr pAsLsXsFs #0x10000000fe0 2018-01-16 15:16:13.176343 caller_uid=33, caller_gid=33{}) currently failed to rdlock, waiting
2018-01-16 15:16:53.233938 7fa59f7fe700 0 log_channel(cluster) log [WRN] : slow request 39.949899 seconds old, received at 2018-01-16 15:16:13.283958: client_request(client.4925109:3772 getattr pAsLsXsFs #0x10000000fe0 2018-01-16 15:16:13.193343 caller_uid=33, caller_gid=33{}) currently failed to rdlock, waiting
2018-01-16 15:16:53.233942 7fa59f7fe700 0 log_channel(cluster) log [WRN] : slow request 39.899975 seconds old, received at 2018-01-16 15:16:13.333882: client_request(client.4925109:3785 getattr pAsLsXsFs #0x10000000fe0 2018-01-16 15:16:13.242344 caller_uid=33, caller_gid=33{}) currently failed to rdlock, waiting
2018-01-16 15:16:53.233946 7fa59f7fe700 0 log_channel(cluster) log [WRN] : slow request 39.965588 seconds old, received at 2018-01-16 15:16:13.268269: client_request(client.4925109:3760 getattr pAsLsXsFs #0x10000000fe0 2018-01-16 15:16:13.177343 caller_uid=33, caller_gid=33{}) currently failed to rdlock, waiting
2018-01-16 15:16:53.233949 7fa59f7fe700 0 log_channel(cluster) log [WRN] : slow request 39.964976 seconds old, received at 2018-01-16 15:16:13.268881: client_request(client.4925109:3761 getattr pAsLsXsFs #0x10000000fe0 2018-01-16 15:16:13.178343 caller_uid=33, caller_gid=33{}) currently failed to rdlock, waiting
2018-01-16 15:16:58.233997 7fa59f7fe700 0 log_channel(cluster) log [WRN] : 49 slow requests, 5 included below; oldest blocked for > 94.995785 secs
2018-01-16 15:16:58.234006 7fa59f7fe700 0 log_channel(cluster) log [WRN] : slow request 60.080351 seconds old, received at 2018-01-16 15:15:58.153569: client_request(client.4925109:3742 getattr pAsLsXsFs #0x10000003e7d 2018-01-16 15:15:58.062900 caller_uid=99, caller_gid=99{}) currently failed to rdlock, waiting
2018-01-16 15:16:58.234010 7fa59f7fe700 0 log_channel(cluster) log [WRN] : slow request 44.929603 seconds old, received at 2018-01-16 15:16:13.304317: client_request(client.4925109:3778 getattr pAsLsXsFs #0x10000000fe0 2018-01-16 15:16:13.213344 caller_uid=33, caller_gid=33{}) currently failed to rdlock, waiting
2018-01-16 15:16:58.234014 7fa59f7fe700 0 log_channel(cluster) log [WRN] : slow request 44.964894 seconds old, received at 2018-01-16 15:16:13.269026: client_request(client.4925109:3762 getattr pAsLsXsFs #0x10000000fe0 2018-01-16 15:16:13.178343 caller_uid=33, caller_gid=33{}) currently failed to rdlock, waiting
2018-01-16 15:16:58.234017 7fa59f7fe700 0 log_channel(cluster) log [WRN] : slow request 44.948258 seconds old, received at 2018-01-16 15:16:13.285662: client_request(client.4925109:3773 getattr pAsLsXsFs #0x10000000fcf 2018-01-16 15:16:13.194343 caller_uid=33, caller_gid=33{}) currently failed to rdlock, waiting
2018-01-16 15:16:58.234020 7fa59f7fe700 0 log_channel(cluster) log [WRN] : slow request 44.817911 seconds old, received at 2018-01-16 15:16:13.416009: client_request(client.4925109:3796 getattr pAsLsXsFs #0x10000000fe0 2018-01-16 15:16:13.325347 caller_uid=33, caller_gid=33{}) currently failed to rdlock, waiting

Our use case is like this:
There are one client issuing writes to a file while tens of other clients issue reads and stat to the same file.

After a series of test, we found that these slow requests are all "getattr" operations, and they are all blocked by CInode::filelock in LOCK_SYNC_MIX state in which rdlock cannot be acquired. Further analysis showed that, in our use case, due tothe writing client's continuous "wants" of CEPH_CAP_GWR , nearly every "getattr" operation's finish would lead to the inode's filelock go into LOCK_SYNC_MIX state in which the mds must wait for all clients holding caps other than the allowed "rwl" to release their unallowed caps and blocks all requests to acquire "rdlock" of the inode's filelock. Due to the massive number of reading clients, the wait could lasts for about 950ms, which means, for every 950 ms, there can nearly be only one getattr operation gets finished.


Related issues

Related to CephFS - Bug #62052: mds: deadlock when getattr changes inode lockset Pending Backport
Copied to CephFS - Backport #23013: luminous: mds: LOCK_SYNC_MIX state makes "getattr" operations extremely slow when there are lots of clients issue writes or reads to the same file Resolved
Copied to CephFS - Backport #23014: jewel: mds: LOCK_SYNC_MIX state makes "getattr" operations extremely slow when there are lots of clients issue writes or reads to the same file Rejected

History

#1 Updated by Kefu Chai about 6 years ago

  • Project changed from Ceph to CephFS

#2 Updated by Patrick Donnelly about 6 years ago

  • Subject changed from LOCK_SYNC_MIX state makes "getattr" operations extremely slow when there are lots of clients issue writes or reads to the same file to mds: LOCK_SYNC_MIX state makes "getattr" operations extremely slow when there are lots of clients issue writes or reads to the same file
  • Category set to Correctness/Safety
  • Priority changed from Normal to High
  • Source set to Community (user)
  • Component(FS) Client, MDS added

#3 Updated by Xuehan Xu about 6 years ago

Hi, Patrick, as discussed yesterday, in our case, the whole procedure of a single run of "getattr" ops processing is this:

  1. The "finish" phase of the previous "getattr" op processing turn the target inode's filelock state into LOCK_SYNC_MIX if its current state is LOCK_SYNC;
  2. mds blocks all other "getattr" operations when the target inode's filelock state is LOCK_SYNC_MIX, and wait for all clients to release caps that are not allowed in LOCK_MIX;
  3. When all clients release those not allowed caps, the inode's filelock state is turned in to LOCK_MIX;
  4. A new "getattr" op is dispatched to be processed, during which the inode's filelock state is turned back to LOCK_SYNC and the "getattr" op is processed successfully;

If the next op that is to be dispatched to be processed is, again, a "getattr", the whole procedure would be repeated. In our case, the waiting of the mds for all clients to release caps that are not allowed in state LOCK_MIX could take several hundreds of milliseconds, which means in one second, only a few "getattr" requests can be processed, and because the "getattr" operation is the mostly issued requests, the above procedure can repeat so many times that the response time of the process of a single "getattr" request can take 40+ seconds as they have to wait for their previous "getattr" ops to be processed first, and in the meantime, the "setattr" operations that are issued by the writing client can hard get the chance to be processed.

#5 Updated by Xuehan Xu about 6 years ago

Zheng Yan wrote:

please try
https://github.com/ukernel/ceph/commit/7db1563416b5559310dbbc834795b83a4ccdaab4

This patch works! Thank you:-)

#6 Updated by Xuehan Xu about 6 years ago

Zheng Yan wrote:

please try
https://github.com/ukernel/ceph/commit/7db1563416b5559310dbbc834795b83a4ccdaab4

Could you make a PR of this commit?
We really want to use this commit, however, without the approval of
the community, we won't have the courage to put this into our online
clusters.
Thank you:-)

#7 Updated by Patrick Donnelly about 6 years ago

  • Backport set to luminous,jewel

#8 Updated by Patrick Donnelly about 6 years ago

  • Status changed from New to Pending Backport
  • Assignee set to Zheng Yan
  • Component(FS) deleted (Client)

#10 Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #23013: luminous: mds: LOCK_SYNC_MIX state makes "getattr" operations extremely slow when there are lots of clients issue writes or reads to the same file added

#11 Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #23014: jewel: mds: LOCK_SYNC_MIX state makes "getattr" operations extremely slow when there are lots of clients issue writes or reads to the same file added

#12 Updated by Patrick Donnelly over 5 years ago

  • Status changed from Pending Backport to Resolved

#13 Updated by Patrick Donnelly 8 months ago

  • Related to Bug #62052: mds: deadlock when getattr changes inode lockset added

Also available in: Atom PDF