Feature #61908
closedmds: provide configuration for trim rate of the journal
0%
Description
Sometimes the journal trimming is not fast enough. Provide configurations to tune it without requiring changing the mds tick interval.
In particular, remove the magic number time limits:
Updated by Venky Shankar 10 months ago
OK, this is what I have in mind:
Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second. MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.
Updated by Patrick Donnelly 10 months ago
Venky Shankar wrote:
OK, this is what I have in mind:
Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second.
For time-based limits, I recommend just using DecayCounters. They are a great fit for this IMO. It's easy to tune with two configs (rate + threshold). Plus it allows for "bursty" or steady-state ("max / second") behavior:
https://docs.ceph.com/en/quincy/cephfs/cache-configuration/#mds-cache-trimming
(grep "steady state" in that doc)
Finally, users are already familiar (or becoming so) with these counter configs.
MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.
The real danger to be avoided is missed heartbeats which it doesn't look like trimming does yet. I would add that for every segment trimmed.
Updated by Venky Shankar 10 months ago
Patrick Donnelly wrote:
Venky Shankar wrote:
OK, this is what I have in mind:
Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second.
For time-based limits, I recommend just using DecayCounters. They are a great fit for this IMO. It's easy to tune with two configs (rate + threshold). Plus it allows for "bursty" or steady-state ("max / second") behavior:
https://docs.ceph.com/en/quincy/cephfs/cache-configuration/#mds-cache-trimming
(grep "steady state" in that doc)
Finally, users are already familiar (or becoming so) with these counter configs.
MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.
The real danger to be avoided is missed heartbeats which it doesn't look like trimming does yet. I would add that for every segment trimmed.
Fair enough. FWIW, MDLog::trim() is tied to tick interval (called on each tick interval). This has been even since. I think there is benefit in having this driven by a separate thread (drive the trim faster/slower) like MDCache::upkeep thread, although the mdcache upkeep thread does more than just trimming (recall client state since its trimming its cache, etc..).
Updated by Venky Shankar 10 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 52652
Updated by Venky Shankar 3 months ago
- Status changed from Fix Under Review to Resolved
- Backport deleted (
reef,quincy,pacific)
Deliberately not backporting this till its baked in main for some more time.
Updated by Venky Shankar about 2 months ago
- Related to Bug #64729: mon.a (mon.0) 1281 : cluster 3 [WRN] MDS_SLOW_METADATA_IO: 3 MDSs report slow metadata IOs" in cluster log added