Project

General

Profile

Feature #44108

mon: osd: handle 2-(main-)site stretch clusters explicitly, so no admin intervention is needed when a DC dies

Added by Greg Farnum about 4 years ago. Updated about 4 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Reviewed:
Affected Versions:
Component(RADOS):
Monitor, OSD
Pull request ID:

Description

People have hacked together stretch clusters on top of Ceph using 3 sites for years, or even using 2 sites and intervening manually to change the crush map and rules if they lose a full site.

But we want to handle it automatically and make sure it's robust against corner cases such as:
  • a netsplit where the OSDs can't talk across DCs but can both reach in-quorum monitors
  • PGs peer and go active with replicas from only one DC, so if that DC is lost we lose the data despite being configured for multiple sites
  • administrators who have misconfigured their system and don't know.

The initial target configuration is 2 full sites and an off-site tiebreaker monitor.


Related issues

Blocked by RADOS - Feature #44107: mon: produce stable election results when netsplits and other errors happen Resolved

History

#1 Updated by Greg Farnum about 4 years ago

  • Blocked by Feature #44107: mon: produce stable election results when netsplits and other errors happen added

#2 Updated by Ken Dreyer about 4 years ago

  • Backport set to nautilus

Also available in: Atom PDF