Project

General

Profile

Feature #44107

mon: produce stable election results when netsplits and other errors happen

Added by Greg Farnum over 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Reviewed:
Affected Versions:
Component(RADOS):
Monitor
Pull request ID:

Description

Right now, in netsplits and similar error conditions the monitors do not produce a stable quorum: whichever monitors are excluded will prompt continuous elections by Proposing to whatever peers they can reach.

To accomplish this, add heartbeating between the monitor daemons, use that to generate connection liveness and reliability scores, and use those scores as input to an election algorithm.

https://github.com/ceph/ceph/pull/32336


Related issues

Blocks RADOS - Feature #44108: mon: osd: handle 2-(main-)site stretch clusters explicitly, so no admin intervention is needed when a DC dies In Progress

History

#1 Updated by Greg Farnum over 3 years ago

  • Status changed from New to Fix Under Review

#2 Updated by Greg Farnum over 3 years ago

  • Blocks Feature #44108: mon: osd: handle 2-(main-)site stretch clusters explicitly, so no admin intervention is needed when a DC dies added

#3 Updated by Ken Dreyer over 3 years ago

  • Backport set to nautilus

#4 Updated by Neha Ojha over 3 years ago

  • Priority changed from Normal to Urgent

Marking anything we need for octopus as "Urgent".

#5 Updated by Greg Farnum over 1 year ago

  • Status changed from Fix Under Review to Resolved
  • Pull request ID changed from 32336 to 35906

Oh, this has been done for ages.

Also available in: Atom PDF