Feature #61908
closed
mds: provide configuration for trim rate of the journal
Added by Patrick Donnelly 11 months ago.
Updated 4 months ago.
Category:
Administration/Usability
Labels (FS):
task(intern), task(medium)
Related issues
1 (1 open — 0 closed)
- Assignee set to Venky Shankar
OK, this is what I have in mind:
Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second. MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.
Venky Shankar wrote:
OK, this is what I have in mind:
Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second.
For time-based limits, I recommend just using DecayCounters. They are a great fit for this IMO. It's easy to tune with two configs (rate + threshold). Plus it allows for "bursty" or steady-state ("max / second") behavior:
https://docs.ceph.com/en/quincy/cephfs/cache-configuration/#mds-cache-trimming
(grep "steady state" in that doc)
Finally, users are already familiar (or becoming so) with these counter configs.
MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.
The real danger to be avoided is missed heartbeats which it doesn't look like trimming does yet. I would add that for every segment trimmed.
Patrick Donnelly wrote:
Venky Shankar wrote:
OK, this is what I have in mind:
Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second.
For time-based limits, I recommend just using DecayCounters. They are a great fit for this IMO. It's easy to tune with two configs (rate + threshold). Plus it allows for "bursty" or steady-state ("max / second") behavior:
https://docs.ceph.com/en/quincy/cephfs/cache-configuration/#mds-cache-trimming
(grep "steady state" in that doc)
Finally, users are already familiar (or becoming so) with these counter configs.
MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.
The real danger to be avoided is missed heartbeats which it doesn't look like trimming does yet. I would add that for every segment trimmed.
Fair enough. FWIW, MDLog::trim() is tied to tick interval (called on each tick interval). This has been even since. I think there is benefit in having this driven by a separate thread (drive the trim faster/slower) like MDCache::upkeep thread, although the mdcache upkeep thread does more than just trimming (recall client state since its trimming its cache, etc..).
- Status changed from New to Fix Under Review
- Pull request ID set to 52652
- Status changed from Fix Under Review to Resolved
- Backport deleted (
reef,quincy,pacific)
Deliberately not backporting this till its baked in main for some more time.
- Related to Bug #64729: mon.a (mon.0) 1281 : cluster 3 [WRN] MDS_SLOW_METADATA_IO: 3 MDSs report slow metadata IOs" in cluster log added
Also available in: Atom
PDF