Project

General

Profile

Bug #27988

Warn if queue of scrubs ready to run exceeds some threshold

Added by David Zafman 9 months ago. Updated 3 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
08/27/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

The sched_scrub_pg set could be scanned during a new insert and the number of scrubs that are ready to be run could be counted and compared to some threshold. It would be nice if this triggered a monitor health warning.


Related issues

Related to RADOS - Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair Can't reproduce 04/06/2018
Related to RADOS - Bug #37269: Prioritize user specified scrubs Resolved 11/14/2018
Related to RADOS - Bug #37264: scrub warning check incorrectly uses mon scrub interval Resolved 11/14/2018

History

#1 Updated by David Zafman 9 months ago

  • Related to Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair added

#2 Updated by David Zafman 9 months ago

  • Subject changed from Warn if queue of scrubs exceeds some threshold to Warn if queue of scrubs ready to run exceeds some threshold

#3 Updated by David Turner 9 months ago

Talking with Sage, he believes there is already a warning status if you have scrubs that haven't run for more than 2x your interval. My experience in the related ticket was with a 30 day deep scrub interval and the repair was happening 3 weeks after it was issued. That indicates that I was within the existing 2x warning threshold but definitely beyond a healthy state.

Another idea that would help is to prioritize user submitted operations higher than automatically scheduled ones due to exceeding intervals.

#4 Updated by David Zafman 9 months ago

I'm want to fix 3 things here. First, user submitted scrubs are queued as due to occur immediately, but overdue scrubs are still prioritized before them. I want to have user submitted scrubs to run before all others. Second, I'd like to get a warning when too many scrubs are overdue. This could occur because too many user submitted scrubs are requested all at once, or because the system as configured can not keep up with the scrub demands. The could be disabled by default. Finally, the code to warn about overdue scrubs in the monitor is broken. It confuses the monitor's own scrubbing interval with pg scrubbing. It shouldn't use the mon_scrub_interval but rather osd_scrub_min_interval/osd_deep_scrub_interval when trying to assess how overdue scrubbing has gotten. Also, what about osd_scrub_max_interval? Also, should mon_warn_not_scrubbed and mon_warn_not_deep_scrubbed be renamed to mon_warn_pg_not_scrubbed and mon_warn_pg_not_deep_scrubbed respectively?

#5 Updated by David Zafman 9 months ago

  • Status changed from New to In Progress

#6 Updated by David Zafman 6 months ago

  • Related to Bug #37269: Prioritize user specified scrubs added

#7 Updated by David Zafman 6 months ago

  • Related to Bug #37264: scrub warning check incorrectly uses mon scrub interval added

#8 Updated by David Zafman 3 months ago

  • Status changed from In Progress to Need More Info

This is put on the back burner until we decide what to do next

#9 Updated by David Zafman 3 months ago

  • Pull request ID set to 23848

Also available in: Atom PDF