Project

General

Profile

Actions

Bug #12615

open

Repair of Erasure Coded pool with an unrepairable object causes pg state to lose clean state

Added by David Zafman almost 9 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
David Zafman
Category:
EC Pools
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After an erasure coded pool 2 + 1 with 2 chunks of a single object corrupted, doing a repair which can't succeed causes pg to lose clean state. The result of a unclean pg is that operations hang
and trying to repair again just causes scrub to requeue continuously. The EIO from rados get requires wip-12000-12200 branch changes.

$ rados -p ecpool get foo dz.out3
error getting ecpool/foo: (5) Input/output error
$ ./ceph pg dump pgs | grep ^3.6
dumped pgs in format plain
3.6     1       0       0       0       0       1048576 1       1       active+clean    2015-08-04 16:14:41.607821      16'1    16:8    [0,1,2] 0       [0,1,2] 0       0'0     2015-08-04 16:14:40.526211      0'0     2015-08-04 16:14:40.526211
$ ceph pg repair 3.6
instructing pg 3.6 on osd.0 to repair
$ ceph pg dump pgs | grep ^3.6
dumped pgs in format plain
3.6     1       1       4       0       1       1048576 1       1       active  2015-08-04 16:15:39.659583      16'1    16:10   [0,1,2] 0       [0,1,2] 0       16'1    2015-08-04 16:15:39.659434      16'1    2015-08-04 16:15:39.659434
[~/ceph/src] (wip-12000-12200-new)
$ ./rados -p ecpool get foo dz.out3
^C

To get to active+clean, I removed the broken object from the filestore and restarted the osd.


Related issues 1 (0 open1 closed)

Related to RADOS - Bug #25084: Attempt to read object that can't be repaired loops foreverResolvedDavid Zafman07/24/2018

Actions
Actions

Also available in: Atom PDF