Project

General

Profile

Actions

Bug #428

closed

osd: recovery stalls on mismatched snapset and object

Added by Sage Weil over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
% Done:

80%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On Wido's cluster, I see recovery stalling on a number of objects where the head's snapset says the size is 4MB, but the object on disk on the other replica is 0 bytes. The pull code gets confused when it doesn't get back what it wants, and stalls out.

First, how did that happen.
Second, how can we properly detect this situation.
Third, what should we do?

Sigh...


Related issues 2 (0 open2 closed)

Related to Ceph - Feature #453: osd: return error (instead of blocking) on lost objectsResolvedColin McCabe10/19/201010/19/2010

Actions
Related to Ceph - Feature #526: osd: unfound objects reworkResolvedColin McCabe10/29/2010

Actions
Actions

Also available in: Atom PDF