Feature #7288
closed
Added by Brian Andrus over 10 years ago.
Updated almost 10 years ago.
Description
Currently, it appears many PGs are allowed to enter a deep-scrubbing state. Due to the more intensive nature of a deep-scrub, overlapping or multiple running deep-scrubs tends to cause resource contention in some cluster configurations. A parameter to limit the number of running deep-scrub processes at any given time would be useful so as to ensure fine-grained control over this I/O intensive process.
As it is we have config value osd_max_scrubs which defaults to 1. This should cause each OSD to only scrub a single PG at a time. It might be nice to abort a scrub if certain load thresholds are exceeded. But the deep-scrub issue probably is more about disk contention as opposed to CPU load.
Yes but if you have one scrub per OSD and you have a replication of 3. It could be easily happen that one OSD has 3 or even 4 scrubs. As the scrub limit is only per primary osd. So an OSD could be scrubbing one primary and X non primary.
Actually, there's a reservation system which should prevent that. The primary must reserve a slot in itself and each replica before the scrub starts. I wonder if there is a simply a bug in that mechanism.
Would it be possible to add an nscrubs limit which works across the cluster?, for example at the pool level: osd_pool_max_scrubs = 2. This would help spread out the deep scrubs so they don't overlap.
- Status changed from New to Resolved
everything but the idea that the scrub timing could be randomized has been implemented. the prioritization will get better over time when the osd queuing is unified.
Also available in: Atom
PDF