Project

General

Profile

Actions

Feature #61908

closed

mds: provide configuration for trim rate of the journal

Added by Patrick Donnelly 10 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
MDS
Labels (FS):
task(intern), task(medium)
Pull request ID:

Description

Sometimes the journal trimming is not fast enough. Provide configurations to tune it without requiring changing the mds tick interval.

In particular, remove the magic number time limits:

https://github.com/ceph/ceph/blob/58df86160858be5c8073ab39040c274c3f6fe312/src/mds/MDLog.cc#L636-L638


Related issues 1 (1 open0 closed)

Related to CephFS - Bug #64729: mon.a (mon.0) 1281 : cluster 3 [WRN] MDS_SLOW_METADATA_IO: 3 MDSs report slow metadata IOs" in cluster logTriagedPatrick Donnelly

Actions
Actions #1

Updated by Venky Shankar 10 months ago

  • Assignee set to Venky Shankar
Actions #2

Updated by Venky Shankar 10 months ago

OK, this is what I have in mind:

Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second. MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.

Actions #3

Updated by Patrick Donnelly 10 months ago

Venky Shankar wrote:

OK, this is what I have in mind:

Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second.

For time-based limits, I recommend just using DecayCounters. They are a great fit for this IMO. It's easy to tune with two configs (rate + threshold). Plus it allows for "bursty" or steady-state ("max / second") behavior:

https://docs.ceph.com/en/quincy/cephfs/cache-configuration/#mds-cache-trimming

(grep "steady state" in that doc)

Finally, users are already familiar (or becoming so) with these counter configs.

MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.

The real danger to be avoided is missed heartbeats which it doesn't look like trimming does yet. I would add that for every segment trimmed.

Actions #4

Updated by Venky Shankar 10 months ago

Patrick Donnelly wrote:

Venky Shankar wrote:

OK, this is what I have in mind:

Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second.

For time-based limits, I recommend just using DecayCounters. They are a great fit for this IMO. It's easy to tune with two configs (rate + threshold). Plus it allows for "bursty" or steady-state ("max / second") behavior:

https://docs.ceph.com/en/quincy/cephfs/cache-configuration/#mds-cache-trimming

(grep "steady state" in that doc)

Finally, users are already familiar (or becoming so) with these counter configs.

MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.

The real danger to be avoided is missed heartbeats which it doesn't look like trimming does yet. I would add that for every segment trimmed.

Fair enough. FWIW, MDLog::trim() is tied to tick interval (called on each tick interval). This has been even since. I think there is benefit in having this driven by a separate thread (drive the trim faster/slower) like MDCache::upkeep thread, although the mdcache upkeep thread does more than just trimming (recall client state since its trimming its cache, etc..).

Actions #5

Updated by Venky Shankar 10 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 52652
Actions #6

Updated by Venky Shankar 3 months ago

  • Status changed from Fix Under Review to Resolved
  • Backport deleted (reef,quincy,pacific)

Deliberately not backporting this till its baked in main for some more time.

Actions #7

Updated by Venky Shankar about 2 months ago

  • Related to Bug #64729: mon.a (mon.0) 1281 : cluster 3 [WRN] MDS_SLOW_METADATA_IO: 3 MDSs report slow metadata IOs" in cluster log added
Actions

Also available in: Atom PDF