Project

General

Profile

Bug #50140

test/thrash - scrub: "not all pgs scrubbed" due to short rescrubbing period

Added by Ronen Friedman almost 3 years ago. Updated almost 3 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
David Zafman
Category:
Tests
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Thrashing test:
The observed error is "Exiting scrub checking -- not all pgs scrubbed".

See below for analysis.


Related issues

Duplicates RADOS - Bug #49868: RuntimeError: Exiting scrub checking -- not all pgs scrubbed New

History

#1 Updated by Ronen Friedman almost 3 years ago

  • Assignee set to David Zafman

Caused by a combination of:
- re-scrub period ("osd scrub min interval") is set in radod/thrash* to (only) 60s.
- a large set of PGs to scrub.
- a PG that failed to reserve replica resources.

The failure flag will only be erased once the queue of PGs to scrub is empty. But under the
first two conditions - that never happens.

#2 Updated by Ronen Friedman almost 3 years ago

Possible fixes to consider:

- a simple fix: extending the tests min-scrub-time;
- possibly better: modify the handling of the "failed once in achieving replicas' resources"
to be periodically cleared.

#5 Updated by Neha Ojha almost 3 years ago

  • Status changed from New to Duplicate

#6 Updated by Neha Ojha almost 3 years ago

  • Duplicates Bug #49868: RuntimeError: Exiting scrub checking -- not all pgs scrubbed added

Also available in: Atom PDF