Project

General

Profile

Actions

Bug #40620

closed

Explicitly requested repair of an inconsistent PG cannot be scheduled timely on a OSD with ongoing recovery

Added by Jeegn Chen almost 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since osd_scrub_during_recovery=false is used as default, when a OSD has some recovering PG, it will not schedule any new scrub including explicitly request repair. Thus inconsistent data cannot be fixed in time, which is not good for data safety.

The proposal is that we introduce a new config option osd_repair_during_recovery, whose default value is false:
  • When osd_scrub_during_recovery is true, ignore osd_repair_during_recovery (no behavior change)
  • When osd_scrub_during_recovery is false and osd_repair_during_recovery is false, no behavior change
  • When osd_scrub_during_recovery is false and osd_repair_during_recovery is true, we would allow `OSD::sched_scrub()` to schedule explicitly request repair (scrubber.must_repair=true)

Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #40840: nautilus: Explicitly requested repair of an inconsistent PG cannot be scheduled timely on a OSD with ongoing recoveryResolvedDavid ZafmanActions
Actions

Also available in: Atom PDF