Actions
Bug #12615
openRepair of Erasure Coded pool with an unrepairable object causes pg state to lose clean state
Status:
New
Priority:
Normal
Assignee:
David Zafman
Category:
EC Pools
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
After an erasure coded pool 2 + 1 with 2 chunks of a single object corrupted, doing a repair which can't succeed causes pg to lose clean state. The result of a unclean pg is that operations hang
and trying to repair again just causes scrub to requeue continuously. The EIO from rados get requires wip-12000-12200 branch changes.
$ rados -p ecpool get foo dz.out3 error getting ecpool/foo: (5) Input/output error $ ./ceph pg dump pgs | grep ^3.6 dumped pgs in format plain 3.6 1 0 0 0 0 1048576 1 1 active+clean 2015-08-04 16:14:41.607821 16'1 16:8 [0,1,2] 0 [0,1,2] 0 0'0 2015-08-04 16:14:40.526211 0'0 2015-08-04 16:14:40.526211 $ ceph pg repair 3.6 instructing pg 3.6 on osd.0 to repair $ ceph pg dump pgs | grep ^3.6 dumped pgs in format plain 3.6 1 1 4 0 1 1048576 1 1 active 2015-08-04 16:15:39.659583 16'1 16:10 [0,1,2] 0 [0,1,2] 0 16'1 2015-08-04 16:15:39.659434 16'1 2015-08-04 16:15:39.659434 [~/ceph/src] (wip-12000-12200-new) $ ./rados -p ecpool get foo dz.out3 ^C
To get to active+clean, I removed the broken object from the filestore and restarted the osd.
Actions