Project

General

Profile

Actions

Feature #38616

closed

Improvements to auto repair

Added by David Zafman about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

We should allow auto repair for bluestore pools since it has built in checksums. Currently, we are limited to erasure coded pools.

In order to trigger a auto repair when regular scrub detects errors, any errors should immediately schedule a deep-scrub.

Add a new pg state flag "failed_repair" when repairs can't fix all errors. This may be tricky to implement because pg repair ends as a recovery operation.

Set failed_repair if primary repair triggered by a client read fails.

Add a count of number of objects that are repaired to PG stats and OSD stats.


Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #38983: nautilus: Improvements to auto repairResolvedDavid ZafmanActions
Actions #1

Updated by David Zafman about 5 years ago

OSD stats might have to be in meta collection

Actions #2

Updated by David Zafman about 5 years ago

  • Status changed from New to In Progress
  • Pull request ID set to 26942
Actions #3

Updated by David Zafman about 5 years ago

I don't think we need to set "failed_repair" if primary can't recover itself on a read error. We are already setting "recovery_unfound" PG state.

If a primary read gets an EIO for example, but we are unable to read another replica this is the resulting PG:

PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
1.0 2 1 1 0 1 3138 0 0 2 2 active+recovery_unfound+degraded 2019-03-14 09:22:36.159974 11'2 16:27 [1,0] 1 [1,0] 1 0'0 2019-03-14 09:21:56.003113 0'0 2019-03-14 09:21:56.003113 0

Actions #4

Updated by David Zafman about 5 years ago

  • Backport set to nautilus
Actions #5

Updated by David Zafman about 5 years ago

  • Status changed from In Progress to Pending Backport
Actions #6

Updated by David Zafman about 5 years ago

Also need to backport 0fb951963ff9d03a592bad0d4442049603195e25 with this.

Actions #7

Updated by Nathan Cutler about 5 years ago

Actions #8

Updated by David Zafman about 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF