Project

General

Profile

Feature #38616

Improvements to auto repair

Added by David Zafman 12 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

We should allow auto repair for bluestore pools since it has built in checksums. Currently, we are limited to erasure coded pools.

In order to trigger a auto repair when regular scrub detects errors, any errors should immediately schedule a deep-scrub.

Add a new pg state flag "failed_repair" when repairs can't fix all errors. This may be tricky to implement because pg repair ends as a recovery operation.

Set failed_repair if primary repair triggered by a client read fails.

Add a count of number of objects that are repaired to PG stats and OSD stats.


Related issues

Copied to RADOS - Backport #38983: nautilus: Improvements to auto repair Resolved

History

#1 Updated by David Zafman 12 months ago

OSD stats might have to be in meta collection

#2 Updated by David Zafman 11 months ago

  • Status changed from New to In Progress
  • Pull request ID set to 26942

#3 Updated by David Zafman 11 months ago

I don't think we need to set "failed_repair" if primary can't recover itself on a read error. We are already setting "recovery_unfound" PG state.

If a primary read gets an EIO for example, but we are unable to read another replica this is the resulting PG:

PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
1.0 2 1 1 0 1 3138 0 0 2 2 active+recovery_unfound+degraded 2019-03-14 09:22:36.159974 11'2 16:27 [1,0] 1 [1,0] 1 0'0 2019-03-14 09:21:56.003113 0'0 2019-03-14 09:21:56.003113 0

#4 Updated by David Zafman 11 months ago

  • Backport set to nautilus

#5 Updated by David Zafman 11 months ago

  • Status changed from In Progress to Pending Backport

#6 Updated by David Zafman 11 months ago

Also need to backport 0fb951963ff9d03a592bad0d4442049603195e25 with this.

#7 Updated by Nathan Cutler 11 months ago

#8 Updated by David Zafman 11 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF