Project

General

Profile

Bug #503

osd: query osds since last_epoch_clean before concluding objects lost?

Added by Sage Weil about 10 years ago. Updated almost 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

We currently query prior_set osds through last_epoch_started. This gives us teh latest log and version. But if we are missing objects, and prior_set_down is empty, we conclude they're lost. That's not quite right. Peering could have completed at last_epoch_started, but recovery didn't, so some osds from before that have the objects in question. If they are temporarily down or slow sending their stray Info during peering, we could incorrect "give up" and conclude the objects are gone.

We probably need to query them in that lower part of peer(). And/or add them to prior_set_down if they are down at that point? Or maybe they should just be part of the prior_set, as that makes all the prior_ste_affected etc. checks apply.


Related issues

Related to Ceph - Feature #526: osd: unfound objects rework Resolved 10/29/2010
Precedes Ceph - Feature #453: osd: return error (instead of blocking) on lost objects Resolved 10/19/2010 10/19/2010

History

#1 Updated by Sage Weil about 10 years ago

  • Target version changed from v0.23 to v0.24

#2 Updated by Sage Weil almost 10 years ago

  • Assignee set to Colin McCabe

#3 Updated by Sage Weil almost 10 years ago

  • Estimated time set to 3.00 h
  • Source set to 1

#4 Updated by Sage Weil almost 10 years ago

  • Status changed from New to Closed

Also available in: Atom PDF