Feature #38616
closedImprovements to auto repair
0%
Description
We should allow auto repair for bluestore pools since it has built in checksums. Currently, we are limited to erasure coded pools.
In order to trigger a auto repair when regular scrub detects errors, any errors should immediately schedule a deep-scrub.
Add a new pg state flag "failed_repair" when repairs can't fix all errors. This may be tricky to implement because pg repair ends as a recovery operation.
Set failed_repair if primary repair triggered by a client read fails.
Add a count of number of objects that are repaired to PG stats and OSD stats.
Updated by David Zafman about 5 years ago
OSD stats might have to be in meta collection
Updated by David Zafman about 5 years ago
- Status changed from New to In Progress
- Pull request ID set to 26942
Updated by David Zafman about 5 years ago
I don't think we need to set "failed_repair" if primary can't recover itself on a read error. We are already setting "recovery_unfound" PG state.
If a primary read gets an EIO for example, but we are unable to read another replica this is the resulting PG:
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
1.0 2 1 1 0 1 3138 0 0 2 2 active+recovery_unfound+degraded 2019-03-14 09:22:36.159974 11'2 16:27 [1,0] 1 [1,0] 1 0'0 2019-03-14 09:21:56.003113 0'0 2019-03-14 09:21:56.003113 0
Updated by David Zafman about 5 years ago
- Status changed from In Progress to Pending Backport
Updated by David Zafman about 5 years ago
Also need to backport 0fb951963ff9d03a592bad0d4442049603195e25 with this.
Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38983: nautilus: Improvements to auto repair added
Updated by David Zafman about 5 years ago
- Status changed from Pending Backport to Resolved