Bug #2160
active+recovering+degraded+backfill becomes active+clean+degraded when recovery completes
0%
Description
In a configuration with 3 replicas of each PG (I haven't tested with others), when one of the disks is replaces, some PGs that had replicas in it get into the active+recovering+degraded+backfill. I'm not sure the “degraded” bit here is appropriate, since we do have 3 replicas, after all, it's just that one of them is undergoing recovery.
Anyway, the more serious (but not really :-) problem here is that, when replication completes, the PG moves to active+clean+degraded state, rather than active+clean. This does not happen when the PG's primary is in the replaced disk, for then the PG recovers in remapped state and goes through peering before active+clean, but apparently the straight jump to active+clean for backfilling secondary replicas fails to clear the degraded bit. If it should have been set in the first place.
Restarting any of the OSDs holding the affected PG suffices to get the replicas to peer and clear the degraded flag, so this is no biggie. But the “degraded” is a bit confusing, in both cases.
Associated revisions
osd: maybe clear DEGRADED on recovery completion
We set degraded if we don't have enough "active" replicas, which excludes
the backfill target. We need to recheck that when we finish recovery and
the backfill target is now complete.
Fixes: #2160
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
History
#1 Updated by Sage Weil over 11 years ago
- Category set to OSD
- Status changed from New to Fix Under Review
- Target version set to v0.44
See wip-2160 for a fix.
#2 Updated by Sage Weil over 11 years ago
- Status changed from Fix Under Review to Resolved