Bug #38900
openEC pools don't self repair on client read error
0%
Description
When a replicated client read fails at the primary, it will pull the object from another OSD (see rep_repair_primary_object()). When an erasure coded read fails the client can get a successful read because other available shards would be used (see send_all_remaining_reads()). Nothing triggers a recovery to repair the broken shards. Maybe this can be triggered in send_all_remaining_reads().
Updated by Greg Farnum about 5 years ago
Just to be clear, this means the object remains degraded, but client IO continues to be served?
Updated by David Zafman about 5 years ago
Yes, client IO is served. The PG is degraded, but the PG state won't necessarily reflect that.
Updated by linhuai deng 11 months ago
I also found this problem in ceph-15.2.8. In the case of ec, a shard was damaged and could be read and returned to the client smoothly, but it did not automatically repair.