Bug #12615: Repair of Erasure Coded pool with an unrepairable object causes pg state to lose clean state - RADOS - Ceph

Actions

Copy link

Bug #12615

open

Repair of Erasure Coded pool with an unrepairable object causes pg state to lose clean state

Added by David Zafman over 8 years ago. Updated over 4 years ago.

Status:

New

Priority:

Normal

Assignee:

David Zafman

Category:

EC Pools

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

After an erasure coded pool 2 + 1 with 2 chunks of a single object corrupted, doing a repair which can't succeed causes pg to lose clean state. The result of a unclean pg is that operations hang
and trying to repair again just causes scrub to requeue continuously. The EIO from rados get requires wip-12000-12200 branch changes.

$ rados -p ecpool get foo dz.out3
error getting ecpool/foo: (5) Input/output error
$ ./ceph pg dump pgs | grep ^3.6
dumped pgs in format plain
3.6     1       0       0       0       0       1048576 1       1       active+clean    2015-08-04 16:14:41.607821      16'1    16:8    [0,1,2] 0       [0,1,2] 0       0'0     2015-08-04 16:14:40.526211      0'0     2015-08-04 16:14:40.526211
$ ceph pg repair 3.6
instructing pg 3.6 on osd.0 to repair
$ ceph pg dump pgs | grep ^3.6
dumped pgs in format plain
3.6     1       1       4       0       1       1048576 1       1       active  2015-08-04 16:15:39.659583      16'1    16:10   [0,1,2] 0       [0,1,2] 0       16'1    2015-08-04 16:15:39.659434      16'1    2015-08-04 16:15:39.659434
[~/ceph/src] (wip-12000-12200-new)
$ ./rados -p ecpool get foo dz.out3
^C

To get to active+clean, I removed the broken object from the filestore and restarted the osd.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by David Zafman over 8 years ago

Description updated (diff)

Actions

Copy link

Updated by David Zafman over 8 years ago

Description updated (diff)

Actions

Copy link

Updated by Sage Weil over 8 years ago

Assignee set to David Zafman

Actions

Copy link

Updated by David Zafman over 8 years ago

Status changed from New to 12

The cause of this is as follows:

PG::scrub_process_inconsistent() clears PG_STATE_CLEAN in repair because a later DoRecovery will be initiated to fix the bad replicas/shards. This code works perfectly for replicated pools because we only consider objects that have an authoritative copy. With erasure coded pools we need to have enough shards to repair. Another issue is that we consider these objects as "fixed" even though we won't know later if the repair worked. I don't like this just in case an OSD goes down and the repair can't proceed.

scrub_finish() outputs a message like "1.6 repair 2 errors, 2 fixed" even though nothing has been fixed yet. It initiates a DoRecovery() state change for the PG.

Normally the PG will be marked CLEAN at the end of the recovery process when normally all the repairs happen.

Actions

Copy link