Actions
Bug #891
closedosd: fix last_epoch_started updates
% Done:
0%
Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
last_epoch_started is used to bound how far back in time we query other OSDs in order to recovery PG state (this is the prior_set). Currently we update last_epoch_started in PG::activate() and broadcast to replicas, but we do this before any state we recovered hits disk. This means we might hear about a last_epoch_started X even though the nodes in X (primary OR replicas) crashed before committing that recovered state to disk.
Instead, we should activate all peers, everyone commit, send acks back to the primary saying "yes, I have committed the recovered pg info and log for this interval", and only then, once all replicas have done so, update last_epoch_started, rebroadcast to replicas, and queue for disk.
Updated by Sage Weil about 13 years ago
- Status changed from New to Resolved
- Assignee set to Sage Weil
Updated by Sage Weil about 13 years ago
- Translation missing: en.field_story_points set to 2
- Translation missing: en.field_position set to 555
Updated by Sage Weil about 13 years ago
- Translation missing: en.field_position deleted (
575) - Translation missing: en.field_position set to 574
Updated by Sage Weil about 13 years ago
- Translation missing: en.field_story_points changed from 2 to 3
- Translation missing: en.field_position deleted (
574) - Translation missing: en.field_position set to 574
Actions