Project

General

Profile

Bug #2160

active+recovering+degraded+backfill becomes active+clean+degraded when recovery completes

Added by Alexandre Oliva over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In a configuration with 3 replicas of each PG (I haven't tested with others), when one of the disks is replaces, some PGs that had replicas in it get into the active+recovering+degraded+backfill. I'm not sure the “degraded” bit here is appropriate, since we do have 3 replicas, after all, it's just that one of them is undergoing recovery.

Anyway, the more serious (but not really :-) problem here is that, when replication completes, the PG moves to active+clean+degraded state, rather than active+clean. This does not happen when the PG's primary is in the replaced disk, for then the PG recovers in remapped state and goes through peering before active+clean, but apparently the straight jump to active+clean for backfilling secondary replicas fails to clear the degraded bit. If it should have been set in the first place.

Restarting any of the OSDs holding the affected PG suffices to get the replicas to peer and clear the degraded flag, so this is no biggie. But the “degraded” is a bit confusing, in both cases.

Associated revisions

Revision 89ccd95a (diff)
Added by Sage Weil over 11 years ago

osd: maybe clear DEGRADED on recovery completion

We set degraded if we don't have enough "active" replicas, which excludes
the backfill target. We need to recheck that when we finish recovery and
the backfill target is now complete.

Fixes: #2160
Signed-off-by: Sage Weil <>
Reviewed-by: Josh Durgin <>

History

#1 Updated by Sage Weil over 11 years ago

  • Category set to OSD
  • Status changed from New to Fix Under Review
  • Target version set to v0.44

See wip-2160 for a fix.

#2 Updated by Sage Weil over 11 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF