Project

General

Profile

Bug #891

osd: fix last_epoch_started updates

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
Start date:
03/15/2011
Due date:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

last_epoch_started is used to bound how far back in time we query other OSDs in order to recovery PG state (this is the prior_set). Currently we update last_epoch_started in PG::activate() and broadcast to replicas, but we do this before any state we recovered hits disk. This means we might hear about a last_epoch_started X even though the nodes in X (primary OR replicas) crashed before committing that recovered state to disk.

Instead, we should activate all peers, everyone commit, send acks back to the primary saying "yes, I have committed the recovered pg info and log for this interval", and only then, once all replicas have done so, update last_epoch_started, rebroadcast to replicas, and queue for disk.


Related issues

Related to Ceph - Bug #865: osd: mark pg clean only after purging strays Won't Fix 03/04/2011

History

#1 Updated by Sage Weil over 8 years ago

  • Status changed from New to Resolved
  • Assignee set to Sage Weil

#2 Updated by Sage Weil over 8 years ago

  • translation missing: en.field_story_points set to 2
  • translation missing: en.field_position set to 555

#3 Updated by Sage Weil over 8 years ago

  • translation missing: en.field_position deleted (575)
  • translation missing: en.field_position set to 574

#4 Updated by Sage Weil over 8 years ago

  • translation missing: en.field_story_points changed from 2 to 3
  • translation missing: en.field_position deleted (574)
  • translation missing: en.field_position set to 574

Also available in: Atom PDF