Bug #12615
openRepair of Erasure Coded pool with an unrepairable object causes pg state to lose clean state
0%
Description
After an erasure coded pool 2 + 1 with 2 chunks of a single object corrupted, doing a repair which can't succeed causes pg to lose clean state. The result of a unclean pg is that operations hang
and trying to repair again just causes scrub to requeue continuously. The EIO from rados get requires wip-12000-12200 branch changes.
$ rados -p ecpool get foo dz.out3 error getting ecpool/foo: (5) Input/output error $ ./ceph pg dump pgs | grep ^3.6 dumped pgs in format plain 3.6 1 0 0 0 0 1048576 1 1 active+clean 2015-08-04 16:14:41.607821 16'1 16:8 [0,1,2] 0 [0,1,2] 0 0'0 2015-08-04 16:14:40.526211 0'0 2015-08-04 16:14:40.526211 $ ceph pg repair 3.6 instructing pg 3.6 on osd.0 to repair $ ceph pg dump pgs | grep ^3.6 dumped pgs in format plain 3.6 1 1 4 0 1 1048576 1 1 active 2015-08-04 16:15:39.659583 16'1 16:10 [0,1,2] 0 [0,1,2] 0 16'1 2015-08-04 16:15:39.659434 16'1 2015-08-04 16:15:39.659434 [~/ceph/src] (wip-12000-12200-new) $ ./rados -p ecpool get foo dz.out3 ^C
To get to active+clean, I removed the broken object from the filestore and restarted the osd.
Updated by David Zafman over 8 years ago
- Status changed from New to 12
The cause of this is as follows:
PG::scrub_process_inconsistent() clears PG_STATE_CLEAN in repair because a later DoRecovery will be initiated to fix the bad replicas/shards. This code works perfectly for replicated pools because we only consider objects that have an authoritative copy. With erasure coded pools we need to have enough shards to repair. Another issue is that we consider these objects as "fixed" even though we won't know later if the repair worked. I don't like this just in case an OSD goes down and the repair can't proceed.
scrub_finish() outputs a message like "1.6 repair 2 errors, 2 fixed" even though nothing has been fixed yet. It initiates a DoRecovery() state change for the PG.
Normally the PG will be marked CLEAN at the end of the recovery process when normally all the repairs happen.
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category set to EC Pools
- Component(RADOS) OSD added
David, is this still an issue?
Updated by David Zafman almost 7 years ago
This will be fixed when we move repair out of the OSD. We shouldn't be using recovery to do repair anyway.
Updated by David Zafman over 5 years ago
- Related to Bug #25084: Attempt to read object that can't be repaired loops forever added
Updated by David Zafman over 5 years ago
In a replicated case which in which all copies are bad, a rep_repair_primary_object() can cause loss of clean and instead be in recovery_unfound.