Project

General

Profile

Actions

Bug #50140

closed

test/thrash - scrub: "not all pgs scrubbed" due to short rescrubbing period

Added by Ronen Friedman about 3 years ago. Updated about 3 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
David Zafman
Category:
Tests
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Thrashing test:
The observed error is "Exiting scrub checking -- not all pgs scrubbed".

See below for analysis.


Related issues 1 (1 open0 closed)

Is duplicate of RADOS - Bug #49868: RuntimeError: Exiting scrub checking -- not all pgs scrubbedNewDavid Zafman

Actions
Actions #1

Updated by Ronen Friedman about 3 years ago

  • Assignee set to David Zafman

Caused by a combination of:
- re-scrub period ("osd scrub min interval") is set in radod/thrash* to (only) 60s.
- a large set of PGs to scrub.
- a PG that failed to reserve replica resources.

The failure flag will only be erased once the queue of PGs to scrub is empty. But under the
first two conditions - that never happens.

Actions #2

Updated by Ronen Friedman about 3 years ago

Possible fixes to consider:

- a simple fix: extending the tests min-scrub-time;
- possibly better: modify the handling of the "failed once in achieving replicas' resources"
to be periodically cleared.

Actions #5

Updated by Neha Ojha about 3 years ago

  • Status changed from New to Duplicate
Actions #6

Updated by Neha Ojha about 3 years ago

  • Is duplicate of Bug #49868: RuntimeError: Exiting scrub checking -- not all pgs scrubbed added
Actions

Also available in: Atom PDF