Bug #27988
closed
Warn if queue of scrubs ready to run exceeds some threshold
Added by David Zafman over 5 years ago.
Updated over 3 years ago.
Description
The sched_scrub_pg set could be scanned during a new insert and the number of scrubs that are ready to be run could be counted and compared to some threshold. It would be nice if this triggered a monitor health warning.
- Related to Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair added
- Subject changed from Warn if queue of scrubs exceeds some threshold to Warn if queue of scrubs ready to run exceeds some threshold
Talking with Sage, he believes there is already a warning status if you have scrubs that haven't run for more than 2x your interval. My experience in the related ticket was with a 30 day deep scrub interval and the repair was happening 3 weeks after it was issued. That indicates that I was within the existing 2x warning threshold but definitely beyond a healthy state.
Another idea that would help is to prioritize user submitted operations higher than automatically scheduled ones due to exceeding intervals.
I'm want to fix 3 things here. First, user submitted scrubs are queued as due to occur immediately, but overdue scrubs are still prioritized before them. I want to have user submitted scrubs to run before all others. Second, I'd like to get a warning when too many scrubs are overdue. This could occur because too many user submitted scrubs are requested all at once, or because the system as configured can not keep up with the scrub demands. The could be disabled by default. Finally, the code to warn about overdue scrubs in the monitor is broken. It confuses the monitor's own scrubbing interval with pg scrubbing. It shouldn't use the mon_scrub_interval but rather osd_scrub_min_interval/osd_deep_scrub_interval when trying to assess how overdue scrubbing has gotten. Also, what about osd_scrub_max_interval? Also, should mon_warn_not_scrubbed and mon_warn_not_deep_scrubbed be renamed to mon_warn_pg_not_scrubbed and mon_warn_pg_not_deep_scrubbed respectively?
- Status changed from New to In Progress
- Related to Bug #37269: Prioritize user specified scrubs added
- Related to Bug #37264: scrub warning check incorrectly uses mon scrub interval added
- Status changed from In Progress to Need More Info
This is put on the back burner until we decide what to do next
- Pull request ID set to 23848
- Status changed from Need More Info to Rejected
- Pull request ID deleted (
23848)
Also available in: Atom
PDF