Project

General

Profile

Actions

Bug #503

closed

osd: query osds since last_epoch_clean before concluding objects lost?

Added by Sage Weil over 13 years ago. Updated over 13 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We currently query prior_set osds through last_epoch_started. This gives us teh latest log and version. But if we are missing objects, and prior_set_down is empty, we conclude they're lost. That's not quite right. Peering could have completed at last_epoch_started, but recovery didn't, so some osds from before that have the objects in question. If they are temporarily down or slow sending their stray Info during peering, we could incorrect "give up" and conclude the objects are gone.

We probably need to query them in that lower part of peer(). And/or add them to prior_set_down if they are down at that point? Or maybe they should just be part of the prior_set, as that makes all the prior_ste_affected etc. checks apply.


Related issues 2 (0 open2 closed)

Related to Ceph - Feature #526: osd: unfound objects reworkResolvedColin McCabe10/29/2010

Actions
Precedes Ceph - Feature #453: osd: return error (instead of blocking) on lost objectsResolvedColin McCabe10/19/201010/19/2010

Actions
Actions #1

Updated by Sage Weil over 13 years ago

  • Target version changed from v0.23 to v0.24
Actions #2

Updated by Sage Weil over 13 years ago

  • Assignee set to Colin McCabe
Actions #3

Updated by Sage Weil over 13 years ago

  • Estimated time set to 3:00 h
  • Source set to 1
Actions #4

Updated by Sage Weil over 13 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF