Feature #61908: mds: provide configuration for trim rate of the journal - CephFS - Ceph

Custom queries

Bug queue
Bug triage
CephFS Bug Triage
CephFS task-easy
CephFS: Available Easy Issues
CephFS: Documentation
Crash queue
Crash triage
Feedback
My issues
Need Review
Pending backports
Product Backlog Scrub
Release: Quincy: Backports (open)
Release: Reef: Backports (open)
Release: Squid: Backports (open)
Release: Squid: Open Issues
Release: Tentacle: Features
Release: Tentacle: Open Issues
Zee CephFS Ticket Well

Actions

Copy link

Feature #61908

closed

mds: provide configuration for trim rate of the journal

Added by Patrick Donnelly 11 months ago. Updated 4 months ago.

Status:

Resolved

Priority:

High

Assignee:

Venky Shankar

Category:

Administration/Usability

Target version:

Ceph - v19.0.0

% Done:

Source:

Development

Tags:

Backport:

Reviewed:

Affected Versions:

Component(FS):

MDS

Labels (FS):

task(intern), task(medium)

Pull request ID:

52652

Description

Sometimes the journal trimming is not fast enough. Provide configurations to tune it without requiring changing the mds tick interval.

In particular, remove the magic number time limits:

https://github.com/ceph/ceph/blob/58df86160858be5c8073ab39040c274c3f6fe312/src/mds/MDLog.cc#L636-L638

Related issues 1 (1 open — 0 closed)

Related to CephFS - Bug #64729: mon.a (mon.0) 1281 : cluster 3 [WRN] MDS_SLOW_METADATA_IO: 3 MDSs report slow metadata IOs" in cluster log

Triaged

Patrick Donnelly

Actions

Issue # Delay: days Cancel

History
Notes
Property changes

Actions

Copy link

Updated by Venky Shankar 10 months ago

Assignee set to Venky Shankar

Actions

Copy link

Updated by Venky Shankar 10 months ago

OK, this is what I have in mind:

Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second. MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.

Actions

Copy link

Updated by Patrick Donnelly 10 months ago

Venky Shankar wrote:

OK, this is what I have in mind:

Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second.

For time-based limits, I recommend just using DecayCounters. They are a great fit for this IMO. It's easy to tune with two configs (rate + threshold). Plus it allows for "bursty" or steady-state ("max / second") behavior:

https://docs.ceph.com/en/quincy/cephfs/cache-configuration/#mds-cache-trimming

(grep "steady state" in that doc)

Finally, users are already familiar (or becoming so) with these counter configs.

MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.

The real danger to be avoided is missed heartbeats which it doesn't look like trimming does yet. I would add that for every segment trimmed.

Actions

Copy link

Updated by Venky Shankar 10 months ago

Patrick Donnelly wrote:

Venky Shankar wrote:

OK, this is what I have in mind:

Introduce an MDS config key that controls the rate of trimming - number of log segments trimmed per second.

For time-based limits, I recommend just using DecayCounters. They are a great fit for this IMO. It's easy to tune with two configs (rate + threshold). Plus it allows for "bursty" or steady-state ("max / second") behavior:

https://docs.ceph.com/en/quincy/cephfs/cache-configuration/#mds-cache-trimming

(grep "steady state" in that doc)

Finally, users are already familiar (or becoming so) with these counter configs.

MDLog trim code will try to maintain this configured trimming rate - if the MDS can't keep up with this trimming rate, MDLog would spend a bit more extra time in trimming. But this will be bounded using something like do not spend more than X seconds in total for trimming. For the case where the MDS overshoots the trimming rate, MDLog would not spend a great deal of time in trimming, i.e., generally break out of trim after one second (in each tick()). If the configured rate is too low then this would result in "MDS behind trimming" warning being generated which is a hint to the operator that the configured trim rate is low for the MDS and needs increase.

The real danger to be avoided is missed heartbeats which it doesn't look like trimming does yet. I would add that for every segment trimmed.

Fair enough. FWIW, MDLog::trim() is tied to tick interval (called on each tick interval). This has been even since. I think there is benefit in having this driven by a separate thread (drive the trim faster/slower) like MDCache::upkeep thread, although the mdcache upkeep thread does more than just trimming (recall client state since its trimming its cache, etc..).

Actions

Copy link