Warn if queue of scrubs ready to run exceeds some threshold
The sched_scrub_pg set could be scanned during a new insert and the number of scrubs that are ready to be run could be counted and compared to some threshold. It would be nice if this triggered a monitor health warning.
#3 Updated by David Turner 9 months ago
Talking with Sage, he believes there is already a warning status if you have scrubs that haven't run for more than 2x your interval. My experience in the related ticket was with a 30 day deep scrub interval and the repair was happening 3 weeks after it was issued. That indicates that I was within the existing 2x warning threshold but definitely beyond a healthy state.
Another idea that would help is to prioritize user submitted operations higher than automatically scheduled ones due to exceeding intervals.
#4 Updated by David Zafman 9 months ago
I'm want to fix 3 things here. First, user submitted scrubs are queued as due to occur immediately, but overdue scrubs are still prioritized before them. I want to have user submitted scrubs to run before all others. Second, I'd like to get a warning when too many scrubs are overdue. This could occur because too many user submitted scrubs are requested all at once, or because the system as configured can not keep up with the scrub demands. The could be disabled by default. Finally, the code to warn about overdue scrubs in the monitor is broken. It confuses the monitor's own scrubbing interval with pg scrubbing. It shouldn't use the mon_scrub_interval but rather osd_scrub_min_interval/osd_deep_scrub_interval when trying to assess how overdue scrubbing has gotten. Also, what about osd_scrub_max_interval? Also, should mon_warn_not_scrubbed and mon_warn_not_deep_scrubbed be renamed to mon_warn_pg_not_scrubbed and mon_warn_pg_not_deep_scrubbed respectively?