Bug #503: osd: query osds since last_epoch_clean before concluding objects lost? - Ceph - Ceph

Actions

Copy link

Bug #503

closed

osd: query osds since last_epoch_clean before concluding objects lost?

Added by Sage Weil over 13 years ago. Updated over 13 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Colin McCabe

Category:

OSD

Target version:

v0.24

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We currently query prior_set osds through last_epoch_started. This gives us teh latest log and version. But if we are missing objects, and prior_set_down is empty, we conclude they're lost. That's not quite right. Peering could have completed at last_epoch_started, but recovery didn't, so some osds from before that have the objects in question. If they are temporarily down or slow sending their stray Info during peering, we could incorrect "give up" and conclude the objects are gone.

We probably need to query them in that lower part of peer(). And/or add them to prior_set_down if they are down at that point? Or maybe they should just be part of the prior_set, as that makes all the prior_ste_affected etc. checks apply.

Related issues 2 (0 open — 2 closed)