Project

General

Profile

Actions

Bug #38900

open

EC pools don't self repair on client read error

Added by David Zafman about 5 years ago. Updated 11 months ago.

Status:
New
Priority:
Low
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When a replicated client read fails at the primary, it will pull the object from another OSD (see rep_repair_primary_object()). When an erasure coded read fails the client can get a successful read because other available shards would be used (see send_all_remaining_reads()). Nothing triggers a recovery to repair the broken shards. Maybe this can be triggered in send_all_remaining_reads().

Actions #1

Updated by Greg Farnum about 5 years ago

Just to be clear, this means the object remains degraded, but client IO continues to be served?

Actions #2

Updated by David Zafman about 5 years ago

Yes, client IO is served. The PG is degraded, but the PG state won't necessarily reflect that.

Actions #3

Updated by David Zafman about 5 years ago

  • Priority changed from Normal to Low
Actions #4

Updated by linhuai deng 11 months ago

I also found this problem in ceph-15.2.8. In the case of ec, a shard was damaged and could be read and returned to the client smoothly, but it did not automatically repair.

Actions

Also available in: Atom PDF