Bug #47307
mds: throttle workloads which acquire caps faster than the client can release
0%
Description
A trivial "find" command on a large directory hierarchy will cause the client to receive caps significantly faster than it will release. The MDS will try to have the client reduce its caps below the mds_max_caps_per_client limit but the recall throttles prevent it from catching up to the pace of acquisition.
I think the strategy for fixing this is to increase the rate of recall for the particular session in response to the workload and to throttle the readdir RPCs from the client.
Edit: I think we should just start by throttling readdir RPCs if the client is over the maximum number of caps.
Related issues
History
#1 Updated by Patrick Donnelly about 3 years ago
- Description updated (diff)
- Status changed from New to Triaged
- Assignee set to Kotresh Hiremath Ravishankar
- Backport set to octopus,nautilus
#2 Updated by Patrick Donnelly almost 3 years ago
- Duplicated by Bug #47682: MDS can't release caps faster than clients taking caps added
#3 Updated by Patrick Donnelly almost 3 years ago
- Status changed from Triaged to In Progress
#4 Updated by Dan van der Ster almost 3 years ago
Are you sure that the defaults for recalling aren't overly conservative?
Today debugging a situation with 2 heavy clients I came up with this approach: I scale the following options linearly to drive more recall:
X=$1 echo Scaling MDS Recall by ${X}x ceph tell mds.* injectargs -- --mds_recall_max_decay_threshold $((X*16*1024)) --mds_recall_max_caps $((X*5000)) --mds_recall_global_max_decay_threshold $((X*64*1024)) --mds_recall_warning_threshold $((X*32*1024)) --mds_cache_trim_threshold $((X*64*1024))
The above scales several related options all by X from their defaults. With this, scaling to 8x, I could keep my MDS under 4GB or even 1GB ram used if I needed to, even though the clients were find'ing inodes at more than 30kHz.
Do we really want to throttle clients when the MDS is itself capable of higher performance?
#5 Updated by Patrick Donnelly almost 3 years ago
Dan van der Ster wrote:
Are you sure that the defaults for recalling aren't overly conservative?
Yes, the probably are.
Today debugging a situation with 2 heavy clients I came up with this approach: I scale the following options linearly to drive more recall:
[...]
The above scales several related options all by X from their defaults. With this, scaling to 8x, I could keep my MDS under 4GB or even 1GB ram used if I needed to, even though the clients were find'ing inodes at more than 30kHz.
Do we really want to throttle clients when the MDS is itself capable of higher performance?
I think it still makes sense to add a throttle. If a client is too slow releasing caps even with aggressive recall, the MDS should slow down handing out large numbers of caps so the client can keep up.
#6 Updated by Kotresh Hiremath Ravishankar almost 3 years ago
- Pull request ID set to 37618
#7 Updated by Patrick Donnelly almost 3 years ago
- Status changed from In Progress to Pending Backport
#8 Updated by Nathan Cutler almost 3 years ago
- Copied to Backport #48191: octopus: mds: throttle workloads which acquire caps faster than the client can release added
#9 Updated by Nathan Cutler almost 3 years ago
- Copied to Backport #48192: nautilus: mds: throttle workloads which acquire caps faster than the client can release added
#10 Updated by Mykola Dvornik over 2 years ago
Is it related to MDS cache overconsumption?
#11 Updated by Nathan Cutler over 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
#12 Updated by Patrick Donnelly over 2 years ago
Mykola Dvornik wrote:
Is it related to MDS cache overconsumption?
Yes.