Project

General

Profile

Bug #40620

Explicitly requested repair of an inconsistent PG cannot be scheduled timely on a OSD with ongoing recovery

Added by Jeegn Chen over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since osd_scrub_during_recovery=false is used as default, when a OSD has some recovering PG, it will not schedule any new scrub including explicitly request repair. Thus inconsistent data cannot be fixed in time, which is not good for data safety.

The proposal is that we introduce a new config option osd_repair_during_recovery, whose default value is false:
  • When osd_scrub_during_recovery is true, ignore osd_repair_during_recovery (no behavior change)
  • When osd_scrub_during_recovery is false and osd_repair_during_recovery is false, no behavior change
  • When osd_scrub_during_recovery is false and osd_repair_during_recovery is true, we would allow `OSD::sched_scrub()` to schedule explicitly request repair (scrubber.must_repair=true)

Related issues

Copied to RADOS - Backport #40840: nautilus: Explicitly requested repair of an inconsistent PG cannot be scheduled timely on a OSD with ongoing recovery Resolved

History

#2 Updated by Neha Ojha over 4 years ago

  • Status changed from New to Fix Under Review

#3 Updated by David Zafman over 4 years ago

  • Pull request ID set to 28839

#4 Updated by Sage Weil over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to nautilus

#5 Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #40840: nautilus: Explicitly requested repair of an inconsistent PG cannot be scheduled timely on a OSD with ongoing recovery added

#6 Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF