Project

General

Profile

Actions

Bug #2160

closed

active+recovering+degraded+backfill becomes active+clean+degraded when recovery completes

Added by Alexandre Oliva about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In a configuration with 3 replicas of each PG (I haven't tested with others), when one of the disks is replaces, some PGs that had replicas in it get into the active+recovering+degraded+backfill. I'm not sure the “degraded” bit here is appropriate, since we do have 3 replicas, after all, it's just that one of them is undergoing recovery.

Anyway, the more serious (but not really :-) problem here is that, when replication completes, the PG moves to active+clean+degraded state, rather than active+clean. This does not happen when the PG's primary is in the replaced disk, for then the PG recovers in remapped state and goes through peering before active+clean, but apparently the straight jump to active+clean for backfilling secondary replicas fails to clear the degraded bit. If it should have been set in the first place.

Restarting any of the OSDs holding the affected PG suffices to get the replicas to peer and clear the degraded flag, so this is no biggie. But the “degraded” is a bit confusing, in both cases.

Actions #1

Updated by Sage Weil about 12 years ago

  • Category set to OSD
  • Status changed from New to Fix Under Review
  • Target version set to v0.44

See wip-2160 for a fix.

Actions #2

Updated by Sage Weil about 12 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF