Bug #38900: EC pools don't self repair on client read error - RADOS - Ceph

Actions

Copy link

Bug #38900

open

EC pools don't self repair on client read error

Added by David Zafman about 5 years ago. Updated 11 months ago.

Status:

New

Priority:

Low

Assignee:

David Zafman

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

When a replicated client read fails at the primary, it will pull the object from another OSD (see rep_repair_primary_object()). When an erasure coded read fails the client can get a successful read because other available shards would be used (see send_all_remaining_reads()). Nothing triggers a recovery to repair the broken shards. Maybe this can be triggered in send_all_remaining_reads().

Actions

Copy link

Updated by Greg Farnum about 5 years ago

Just to be clear, this means the object remains degraded, but client IO continues to be served?

Actions

Copy link

Updated by David Zafman about 5 years ago

Yes, client IO is served. The PG is degraded, but the PG state won't necessarily reflect that.

Actions

Copy link

Updated by David Zafman about 5 years ago

Priority changed from Normal to Low

Actions

Copy link

Updated by linhuai deng 11 months ago

I also found this problem in ceph-15.2.8. In the case of ec, a shard was damaged and could be read and returned to the client smoothly, but it did not automatically repair.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #38900

EC pools don't self repair on client read error

Updated by Greg Farnum about 5 years ago

Updated by David Zafman about 5 years ago

Updated by David Zafman about 5 years ago

Updated by linhuai deng 11 months ago