Project

General

Profile

Actions

Bug #40620

closed

Explicitly requested repair of an inconsistent PG cannot be scheduled timely on a OSD with ongoing recovery

Added by Jeegn Chen almost 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since osd_scrub_during_recovery=false is used as default, when a OSD has some recovering PG, it will not schedule any new scrub including explicitly request repair. Thus inconsistent data cannot be fixed in time, which is not good for data safety.

The proposal is that we introduce a new config option osd_repair_during_recovery, whose default value is false:
  • When osd_scrub_during_recovery is true, ignore osd_repair_during_recovery (no behavior change)
  • When osd_scrub_during_recovery is false and osd_repair_during_recovery is false, no behavior change
  • When osd_scrub_during_recovery is false and osd_repair_during_recovery is true, we would allow `OSD::sched_scrub()` to schedule explicitly request repair (scrubber.must_repair=true)

Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #40840: nautilus: Explicitly requested repair of an inconsistent PG cannot be scheduled timely on a OSD with ongoing recoveryResolvedDavid ZafmanActions
Actions #2

Updated by Neha Ojha almost 5 years ago

  • Status changed from New to Fix Under Review
Actions #3

Updated by David Zafman almost 5 years ago

  • Pull request ID set to 28839
Actions #4

Updated by Sage Weil almost 5 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to nautilus
Actions #5

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #40840: nautilus: Explicitly requested repair of an inconsistent PG cannot be scheduled timely on a OSD with ongoing recovery added
Actions #6

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF