Feature #55764
Adaptive mon_warn_pg_not_deep_scrubbed_ratio according to actual scrub throughput
0%
Description
This request comes from the Science Users Working Group https://pad.ceph.com/p/Ceph_Science_User_Group_20220524
For clusters with very large OSDs with high space usage and intensive client IO, the defaults related to PG_NOT_SCRUBBED and PG_NOT_DEEP_SCRUBBED warnings can be too aggressive.
That is, it is not always possible to scrub all PGs daily and to deep scrub of all PGs weekly.
Such clusters raise warnings that PGs are not scrubbed in time, leading to operator confusion.
- osd_max_scrubs (defaults to 1 per OSD)
- the amount of data to be scrubbed per OSD (which is increasing, can be over 15TB nowadays).
- the rate at which an OSD can satisfy scrub reads (can be in the low 10s of MBps for large HDDs busy with client IO).
- the size of a PG: E.g. a replica=3 PG locks three OSDs for scrubs, whereas an EC4+2 PG locks six OSDs.
Would it be possible for the MON to use an adaptive approach to issuing scrub timeout warnings? E.g. the mon could scale the mon_warn_pg_not_deep_scrubbed_ratio configs according to the above parameters, or perhaps by monitoring the actual time used to complete scrubs.
Note that the wallclock time to scrub a given PG should be uniform across a pool, but would vary widely from pool to pool (i.e. empty pools can be scrubbed quickly).
History
#1 Updated by Dan van der Ster almost 2 years ago
- Subject changed from Adaptive to Adaptive mon_warn_pg_not_deep_scrubbed_ratio according to actual scrub throughput