Bug #2160
closedactive+recovering+degraded+backfill becomes active+clean+degraded when recovery completes
0%
Description
In a configuration with 3 replicas of each PG (I haven't tested with others), when one of the disks is replaces, some PGs that had replicas in it get into the active+recovering+degraded+backfill. I'm not sure the “degraded” bit here is appropriate, since we do have 3 replicas, after all, it's just that one of them is undergoing recovery.
Anyway, the more serious (but not really :-) problem here is that, when replication completes, the PG moves to active+clean+degraded state, rather than active+clean. This does not happen when the PG's primary is in the replaced disk, for then the PG recovers in remapped state and goes through peering before active+clean, but apparently the straight jump to active+clean for backfilling secondary replicas fails to clear the degraded bit. If it should have been set in the first place.
Restarting any of the OSDs holding the affected PG suffices to get the replicas to peer and clear the degraded flag, so this is no biggie. But the “degraded” is a bit confusing, in both cases.