Project

General

Profile

Bug #2866

osd: pg stuck with unfound

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
argonaut
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

on congress, observed pg stuck with unfound objects. kicking peering (marking primary down) resolved it.

in testing #2860 fix, observed:

- osd A's pg peers, finds missing on osd B
- osd B goes down. now becomes unfound
- osd B comes back up, still unfound.

log attached (pg 1.6). osd.2 comes back in epoch 31, but pg 1.6 doesn't notice. it checks for sources going down, but not down sources coming up.

osd.1.log.gz (15.6 MB) Sage Weil, 07/27/2012 04:30 PM

Associated revisions

Revision 7b9d37c6 (diff)
Added by Sage Weil over 11 years ago

osd: set STRAY on pg load when non-primary

The STRAY bit indicates that we should annouce ourselves to the primary,
but it is only set in start_peering_interval(). We also need to set it
initially, so that a PG that is loaded but whose role does not change
(e.g., the stray replica stays a stray) will notify the primary.

Observed:
- osd starts up
- mapping does not change, STRAY not set
- does not announce to primary
- primary does not re-check must_have_unfound, objects appear unfound

Fix this by initializing STRAY when pg is loaded or created whenever we
are not the primary.

Fixes: #2866
Signed-off-by: Sage Weil <>

Revision 9e5d4e61 (diff)
Added by Sage Weil over 11 years ago

osd: initialize send_notify on pg load

When the PG is loaded, we need to set send_notify if we are not the
primary. Otherwise, if the PG does not go through
start_peering_interval() or experience a role change, we will not set
the flag and tell the primary that we exist. This can cause problems
for example if we have unfound objects that the primary needs, although
I'm sure there are other bad implications as well.

Fixes: #2866
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 11 years ago

#2 Updated by Sage Weil over 11 years ago

  • Status changed from 12 to Fix Under Review

#3 Updated by Sage Weil over 11 years ago

  • Status changed from Fix Under Review to Resolved
  • Backport set to argonaut

Also available in: Atom PDF