Actions
Bug #50140
closedtest/thrash - scrub: "not all pgs scrubbed" due to short rescrubbing period
Status:
Duplicate
Priority:
Normal
Assignee:
David Zafman
Category:
Tests
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Thrashing test:
The observed error is "Exiting scrub checking -- not all pgs scrubbed".
See below for analysis.
Updated by Ronen Friedman about 3 years ago
- Assignee set to David Zafman
Caused by a combination of:
- re-scrub period ("osd scrub min interval") is set in radod/thrash* to (only) 60s.
- a large set of PGs to scrub.
- a PG that failed to reserve replica resources.
The failure flag will only be erased once the queue of PGs to scrub is empty. But under the
first two conditions - that never happens.
Updated by Ronen Friedman about 3 years ago
Possible fixes to consider:
- a simple fix: extending the tests min-scrub-time;
- possibly better: modify the handling of the "failed once in achieving replicas' resources"
to be periodically cleared.
Updated by Ronen Friedman about 3 years ago
a duplicate of https://tracker.ceph.com/issues/49868
Updated by Neha Ojha about 3 years ago
- Is duplicate of Bug #49868: RuntimeError: Exiting scrub checking -- not all pgs scrubbed added
Actions