Project

General

Profile

Feature #55764

Adaptive mon_warn_pg_not_deep_scrubbed_ratio according to actual scrub throughput

Added by Dan van der Ster 6 months ago. Updated 6 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
Monitor
Pull request ID:

Description

This request comes from the Science Users Working Group https://pad.ceph.com/p/Ceph_Science_User_Group_20220524

For clusters with very large OSDs with high space usage and intensive client IO, the defaults related to PG_NOT_SCRUBBED and PG_NOT_DEEP_SCRUBBED warnings can be too aggressive.
That is, it is not always possible to scrub all PGs daily and to deep scrub of all PGs weekly.
Such clusters raise warnings that PGs are not scrubbed in time, leading to operator confusion.

Factors which impact the rate at which a cluster can scrub PGs might include:
  • osd_max_scrubs (defaults to 1 per OSD)
  • the amount of data to be scrubbed per OSD (which is increasing, can be over 15TB nowadays).
  • the rate at which an OSD can satisfy scrub reads (can be in the low 10s of MBps for large HDDs busy with client IO).
  • the size of a PG: E.g. a replica=3 PG locks three OSDs for scrubs, whereas an EC4+2 PG locks six OSDs.

Would it be possible for the MON to use an adaptive approach to issuing scrub timeout warnings? E.g. the mon could scale the mon_warn_pg_not_deep_scrubbed_ratio configs according to the above parameters, or perhaps by monitoring the actual time used to complete scrubs.
Note that the wallclock time to scrub a given PG should be uniform across a pool, but would vary widely from pool to pool (i.e. empty pools can be scrubbed quickly).

History

#1 Updated by Dan van der Ster 6 months ago

  • Subject changed from Adaptive to Adaptive mon_warn_pg_not_deep_scrubbed_ratio according to actual scrub throughput

Also available in: Atom PDF