Project

General

Profile

Feature #10973

randomize scrub times

Added by Samuel Just almost 5 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

40%

Source:
other
Tags:
Backport:
hammer
Reviewed:
Affected Versions:
Pull request ID:

Description

Currently, pgs tend to scrub in a big wave when they get to their hard scrub interval. We'd prefer that the individual pg scrub times be well distributed within the scrub interval. I think the simplest way to get to that point would be to add a uniform random offset to the scrub schedule time.


Related issues

Related to RADOS - Feature #10931: osd: better scrub scheduling New 02/23/2015
Copied to Ceph - Backport #13409: randomize scrub times Resolved

Associated revisions

Revision 5e44040e (diff)
Added by Kefu Chai over 4 years ago

osd: randomize scrub times to avoid scrub wave

- to avoid the scrub wave when the osd_scrub_max_interval reaches in a
high-load OSD, the scrub time is randomized.
- extract scrub_load_below_threshold() out of scrub_should_schedule()
- schedule an automatic scrub job at a time which is uniformly distributed
over [now+osd_scrub_min_interval,
now+osd_scrub_min_interval*(1+osd_scrub_time_limit]. before
this change this sort of scrubs will be performed once the hard interval
is end or system load is below the threshold, but with this change, the
jobs will be performed as long as the load is low or the interval of
the scheduled scrubs is longer than conf.osd_scrub_max_interval. all
automatic jobs should be performed in the configured time period, otherwise
they are postponed.
- the requested scrub job will be scheduled right away, before this change
it is queued with the timestamp of `now` and postponed after
osd_scrub_min_interval.

Fixes: #10973
Signed-off-by: Kefu Chai <>

Revision 6344fc83 (diff)
Added by Kefu Chai over 4 years ago

osd: use another name for randomize scrub option

s/osd_scrub_interval_limit/osd_scrub_interval_randomize_ratio/

Fixes: #10973
Signed-off-by: Kefu Chai <>

Revision fad33861 (diff)
Added by Kefu Chai about 4 years ago

osd: randomize scrub times to avoid scrub wave

- to avoid the scrub wave when the osd_scrub_max_interval reaches in a
high-load OSD, the scrub time is randomized.
- extract scrub_load_below_threshold() out of scrub_should_schedule()
- schedule an automatic scrub job at a time which is uniformly distributed
over [now+osd_scrub_min_interval,
now+osd_scrub_min_interval*(1+osd_scrub_time_limit]. before
this change this sort of scrubs will be performed once the hard interval
is end or system load is below the threshold, but with this change, the
jobs will be performed as long as the load is low or the interval of
the scheduled scrubs is longer than conf.osd_scrub_max_interval. all
automatic jobs should be performed in the configured time period, otherwise
they are postponed.
- the requested scrub job will be scheduled right away, before this change
it is queued with the timestamp of `now` and postponed after
osd_scrub_min_interval.

Fixes: #10973
Signed-off-by: Kefu Chai <>
(cherry picked from commit 5e44040e8528bff06cc0a5a3f3293ab146e0e4e1)

Conflicts:
src/osd/OSD.cc

Revision 0742177c (diff)
Added by Kefu Chai about 4 years ago

osd: use another name for randomize scrub option

s/osd_scrub_interval_limit/osd_scrub_interval_randomize_ratio/

Fixes: #10973
Signed-off-by: Kefu Chai <>

History

#1 Updated by Samuel Just almost 5 years ago

  • Target version deleted (v0.94)

#2 Updated by Guang Yang almost 5 years ago

Another optimization we might want to do, is that we properly don't need that frequent (3seconds) scheduling if there are large amount of data...

#3 Updated by Kefu Chai almost 5 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 40

#4 Updated by Kefu Chai almost 5 years ago

  • Status changed from In Progress to New
  • % Done changed from 40 to 0

Guang Yang wrote:

Another optimization we might want to do, is that we properly don't need that frequent (3seconds) scheduling if there are large amount of data...

the default osd_heartbeat_interval is 6, and we schedule the OSD->tick() every conf->osd_heartbeat_interval seconds. and OSD::tick() is where we try to schedule a scrub for a PG registered for the scrubbing.

and it is now controls the interval to check the schedule jobs. see https://github.com/ceph/ceph/pull/3905 which is pending on review.

#5 Updated by Kefu Chai almost 5 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 40

#6 Updated by Guang Yang almost 5 years ago

Kefu Chai wrote:

Guang Yang wrote:

Another optimization we might want to do, is that we properly don't need that frequent (3seconds) scheduling if there are large amount of data...

the default osd_heartbeat_interval is 6, and we schedule the OSD->tick() every conf->osd_heartbeat_interval seconds. and OSD::tick() is where we try to schedule a scrub for a PG registered for the scrubbing.

and it is now controls the interval to check the schedule jobs. see https://github.com/ceph/ceph/pull/3905 which is pending on review.

The OSD::tick will run every 1 second (https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L3945), and given the current schedule mechanism, the it will run 1 out of 3 seconds to schedule some scrubbing.. that might be too frequent for large deployment..

#7 Updated by Kefu Chai almost 5 years ago

  • Status changed from In Progress to Fix Under Review

Guang Yang wrote:

The OSD::tick will run every 1 second (https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L3945), and given the current schedule mechanism, the it will run 1 out of 3 seconds to schedule some scrubbing.. that might be too frequent for large deployment..

yes, you are right. the tick() re-schedule itself every 1 sec.

and the PR is posted at https://github.com/ceph/ceph/pull/3946, and is pending on review.

#8 Updated by Sage Weil over 4 years ago

  • Target version set to v9.0.2

#9 Updated by Kefu Chai over 4 years ago

  • Status changed from Fix Under Review to Resolved

#10 Updated by Loic Dachary about 4 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to hammer

#11 Updated by Loic Dachary almost 4 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF