Project

General

Profile

Bug #47307

mds: throttle workloads which acquire caps faster than the client can release

Added by Patrick Donnelly 5 months ago. Updated 1 day ago.

Status:
Pending Backport
Priority:
High
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature:

Description

A trivial "find" command on a large directory hierarchy will cause the client to receive caps significantly faster than it will release. The MDS will try to have the client reduce its caps below the mds_max_caps_per_client limit but the recall throttles prevent it from catching up to the pace of acquisition.

I think the strategy for fixing this is to increase the rate of recall for the particular session in response to the workload and to throttle the readdir RPCs from the client.

Edit: I think we should just start by throttling readdir RPCs if the client is over the maximum number of caps.


Related issues

Duplicated by CephFS - Bug #47682: MDS can't release caps faster than clients taking caps Rejected
Copied to CephFS - Backport #48191: octopus: mds: throttle workloads which acquire caps faster than the client can release Resolved
Copied to CephFS - Backport #48192: nautilus: mds: throttle workloads which acquire caps faster than the client can release In Progress

History

#1 Updated by Patrick Donnelly 5 months ago

  • Description updated (diff)
  • Status changed from New to Triaged
  • Assignee set to Kotresh Hiremath Ravishankar
  • Backport set to octopus,nautilus

#2 Updated by Patrick Donnelly 4 months ago

  • Duplicated by Bug #47682: MDS can't release caps faster than clients taking caps added

#3 Updated by Patrick Donnelly 4 months ago

  • Status changed from Triaged to In Progress

#4 Updated by Dan van der Ster 4 months ago

Are you sure that the defaults for recalling aren't overly conservative?

Today debugging a situation with 2 heavy clients I came up with this approach: I scale the following options linearly to drive more recall:

X=$1

echo Scaling MDS Recall by ${X}x
ceph tell mds.* injectargs -- --mds_recall_max_decay_threshold $((X*16*1024)) --mds_recall_max_caps $((X*5000)) --mds_recall_global_max_decay_threshold $((X*64*1024)) --mds_recall_warning_threshold $((X*32*1024)) --mds_cache_trim_threshold $((X*64*1024))

The above scales several related options all by X from their defaults. With this, scaling to 8x, I could keep my MDS under 4GB or even 1GB ram used if I needed to, even though the clients were find'ing inodes at more than 30kHz.

Do we really want to throttle clients when the MDS is itself capable of higher performance?

#5 Updated by Patrick Donnelly 4 months ago

Dan van der Ster wrote:

Are you sure that the defaults for recalling aren't overly conservative?

Yes, the probably are.

Today debugging a situation with 2 heavy clients I came up with this approach: I scale the following options linearly to drive more recall:

[...]

The above scales several related options all by X from their defaults. With this, scaling to 8x, I could keep my MDS under 4GB or even 1GB ram used if I needed to, even though the clients were find'ing inodes at more than 30kHz.

Do we really want to throttle clients when the MDS is itself capable of higher performance?

I think it still makes sense to add a throttle. If a client is too slow releasing caps even with aggressive recall, the MDS should slow down handing out large numbers of caps so the client can keep up.

#6 Updated by Kotresh Hiremath Ravishankar 3 months ago

  • Pull request ID set to 37618

#7 Updated by Patrick Donnelly 3 months ago

  • Status changed from In Progress to Pending Backport

#8 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #48191: octopus: mds: throttle workloads which acquire caps faster than the client can release added

#9 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #48192: nautilus: mds: throttle workloads which acquire caps faster than the client can release added

#10 Updated by Mykola Dvornik 1 day ago

Is it related to MDS cache overconsumption?

Also available in: Atom PDF