Feature #44108: mon: osd: handle 2-(main-)site stretch clusters explicitly, so no admin intervention is needed when a DC dies - RADOS - Ceph

Actions

Copy link

Feature #44108

open

mon: osd: handle 2-(main-)site stretch clusters explicitly, so no admin intervention is needed when a DC dies

Added by Greg Farnum about 4 years ago. Updated about 4 years ago.

Status:

In Progress

Priority:

Normal

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Source:

Tags:

Backport:

nautilus

Reviewed:

Affected Versions:

Component(RADOS):

Monitor, OSD

Pull request ID:

Description

People have hacked together stretch clusters on top of Ceph using 3 sites for years, or even using 2 sites and intervening manually to change the crush map and rules if they lose a full site.

But we want to handle it automatically and make sure it's robust against corner cases such as:

a netsplit where the OSDs can't talk across DCs but can both reach in-quorum monitors
PGs peer and go active with replicas from only one DC, so if that DC is lost we lose the data despite being configured for multiple sites
administrators who have misconfigured their system and don't know.

The initial target configuration is 2 full sites and an off-site tiebreaker monitor.

Related issues 1 (0 open — 1 closed)