Feature #10973
randomize scrub times
Description
Currently, pgs tend to scrub in a big wave when they get to their hard scrub interval. We'd prefer that the individual pg scrub times be well distributed within the scrub interval. I think the simplest way to get to that point would be to add a uniform random offset to the scrub schedule time.
Related issues
Associated revisions
osd: randomize scrub times to avoid scrub wave
- to avoid the scrub wave when the osd_scrub_max_interval reaches in a
high-load OSD, the scrub time is randomized.
- extract scrub_load_below_threshold() out of scrub_should_schedule()
- schedule an automatic scrub job at a time which is uniformly distributed
over [now+osd_scrub_min_interval,
now+osd_scrub_min_interval*(1+osd_scrub_time_limit]. before
this change this sort of scrubs will be performed once the hard interval
is end or system load is below the threshold, but with this change, the
jobs will be performed as long as the load is low or the interval of
the scheduled scrubs is longer than conf.osd_scrub_max_interval. all
automatic jobs should be performed in the configured time period, otherwise
they are postponed.
- the requested scrub job will be scheduled right away, before this change
it is queued with the timestamp of `now` and postponed after
osd_scrub_min_interval.
Fixes: #10973
Signed-off-by: Kefu Chai <kchai@redhat.com>
osd: use another name for randomize scrub option
s/osd_scrub_interval_limit/osd_scrub_interval_randomize_ratio/
Fixes: #10973
Signed-off-by: Kefu Chai <kchai@redhat.com>
osd: randomize scrub times to avoid scrub wave
- to avoid the scrub wave when the osd_scrub_max_interval reaches in a
high-load OSD, the scrub time is randomized.
- extract scrub_load_below_threshold() out of scrub_should_schedule()
- schedule an automatic scrub job at a time which is uniformly distributed
over [now+osd_scrub_min_interval,
now+osd_scrub_min_interval*(1+osd_scrub_time_limit]. before
this change this sort of scrubs will be performed once the hard interval
is end or system load is below the threshold, but with this change, the
jobs will be performed as long as the load is low or the interval of
the scheduled scrubs is longer than conf.osd_scrub_max_interval. all
automatic jobs should be performed in the configured time period, otherwise
they are postponed.
- the requested scrub job will be scheduled right away, before this change
it is queued with the timestamp of `now` and postponed after
osd_scrub_min_interval.
Fixes: #10973
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 5e44040e8528bff06cc0a5a3f3293ab146e0e4e1)
Conflicts:
src/osd/OSD.cc
osd: use another name for randomize scrub option
s/osd_scrub_interval_limit/osd_scrub_interval_randomize_ratio/
Fixes: #10973
Signed-off-by: Kefu Chai <kchai@redhat.com>
History
#1 Updated by Samuel Just about 9 years ago
- Target version deleted (
v0.94)
#2 Updated by Guang Yang about 9 years ago
Another optimization we might want to do, is that we properly don't need that frequent (3seconds) scheduling if there are large amount of data...
#3 Updated by Kefu Chai about 9 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 40
#4 Updated by Kefu Chai about 9 years ago
- Status changed from In Progress to New
- % Done changed from 40 to 0
Guang Yang wrote:
Another optimization we might want to do, is that we properly don't need that frequent (3seconds) scheduling if there are large amount of data...
the default osd_heartbeat_interval is 6, and we schedule the OSD->tick()
every conf->osd_heartbeat_interval
seconds. and OSD::tick()
is where we try to schedule a scrub for a PG registered for the scrubbing.
and it is now controls the interval to check the schedule jobs. see https://github.com/ceph/ceph/pull/3905 which is pending on review.
#5 Updated by Kefu Chai about 9 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 40
#6 Updated by Guang Yang about 9 years ago
Kefu Chai wrote:
Guang Yang wrote:
Another optimization we might want to do, is that we properly don't need that frequent (3seconds) scheduling if there are large amount of data...
the default osd_heartbeat_interval is 6, and we schedule the
OSD->tick()
everyconf->osd_heartbeat_interval
seconds. andOSD::tick()
is where we try to schedule a scrub for a PG registered for the scrubbing.and it is now controls the interval to check the schedule jobs. see https://github.com/ceph/ceph/pull/3905 which is pending on review.
The OSD::tick will run every 1 second (https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L3945), and given the current schedule mechanism, the it will run 1 out of 3 seconds to schedule some scrubbing.. that might be too frequent for large deployment..
#7 Updated by Kefu Chai about 9 years ago
- Status changed from In Progress to Fix Under Review
Guang Yang wrote:
The OSD::tick will run every 1 second (https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L3945), and given the current schedule mechanism, the it will run 1 out of 3 seconds to schedule some scrubbing.. that might be too frequent for large deployment..
yes, you are right. the tick() re-schedule itself every 1 sec.
and the PR is posted at https://github.com/ceph/ceph/pull/3946, and is pending on review.
#8 Updated by Sage Weil almost 9 years ago
- Target version set to v9.0.2
#9 Updated by Kefu Chai almost 9 years ago
- Status changed from Fix Under Review to Resolved
#10 Updated by Loïc Dachary over 8 years ago
- Status changed from Resolved to Pending Backport
- Backport set to hammer
#11 Updated by Loïc Dachary about 8 years ago
- Status changed from Pending Backport to Resolved