Project

General

Profile

Feature #7288

Deep-scrub throttle

Added by Brian Andrus about 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Currently, it appears many PGs are allowed to enter a deep-scrubbing state. Due to the more intensive nature of a deep-scrub, overlapping or multiple running deep-scrubs tends to cause resource contention in some cluster configurations. A parameter to limit the number of running deep-scrub processes at any given time would be useful so as to ensure fine-grained control over this I/O intensive process.

History

#1 Updated by David Zafman about 10 years ago

As it is we have config value osd_max_scrubs which defaults to 1. This should cause each OSD to only scrub a single PG at a time. It might be nice to abort a scrub if certain load thresholds are exceeded. But the deep-scrub issue probably is more about disk contention as opposed to CPU load.

#2 Updated by Stefan Priebe about 10 years ago

Yes but if you have one scrub per OSD and you have a replication of 3. It could be easily happen that one OSD has 3 or even 4 scrubs. As the scrub limit is only per primary osd. So an OSD could be scrubbing one primary and X non primary.

#3 Updated by Samuel Just about 10 years ago

Actually, there's a reservation system which should prevent that. The primary must reserve a slot in itself and each replica before the scrub starts. I wonder if there is a simply a bug in that mechanism.

#4 Updated by Dan van der Ster almost 10 years ago

Would it be possible to add an nscrubs limit which works across the cluster?, for example at the pool level: osd_pool_max_scrubs = 2. This would help spread out the deep scrubs so they don't overlap.

#5 Updated by Sage Weil over 9 years ago

  • Status changed from New to Resolved

everything but the idea that the scrub timing could be randomized has been implemented. the prioritization will get better over time when the osd queuing is unified.

Also available in: Atom PDF