Bug #2160: active+recovering+degraded+backfill becomes active+clean+degraded when recovery completes - Ceph - Ceph

Actions

Copy link

Bug #2160

closed

active+recovering+degraded+backfill becomes active+clean+degraded when recovery completes

Added by Alexandre Oliva about 12 years ago. Updated about 12 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

v0.44

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

In a configuration with 3 replicas of each PG (I haven't tested with others), when one of the disks is replaces, some PGs that had replicas in it get into the active+recovering+degraded+backfill. I'm not sure the “degraded” bit here is appropriate, since we do have 3 replicas, after all, it's just that one of them is undergoing recovery.

Anyway, the more serious (but not really :-) problem here is that, when replication completes, the PG moves to active+clean+degraded state, rather than active+clean. This does not happen when the PG's primary is in the replaced disk, for then the PG recovers in remapped state and goes through peering before active+clean, but apparently the straight jump to active+clean for backfilling secondary replicas fails to clear the degraded bit. If it should have been set in the first place.

Restarting any of the OSDs holding the affected PG suffices to get the replicas to peer and clear the degraded flag, so this is no biggie. But the “degraded” is a bit confusing, in both cases.