Project

General

Profile

Actions

Feature #44107

closed

mon: produce stable election results when netsplits and other errors happen

Added by Greg Farnum about 4 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Reviewed:
Affected Versions:
Component(RADOS):
Monitor
Pull request ID:

Description

Right now, in netsplits and similar error conditions the monitors do not produce a stable quorum: whichever monitors are excluded will prompt continuous elections by Proposing to whatever peers they can reach.

To accomplish this, add heartbeating between the monitor daemons, use that to generate connection liveness and reliability scores, and use those scores as input to an election algorithm.

https://github.com/ceph/ceph/pull/32336


Related issues 1 (1 open0 closed)

Blocks RADOS - Feature #44108: mon: osd: handle 2-(main-)site stretch clusters explicitly, so no admin intervention is needed when a DC diesIn ProgressGreg Farnum

Actions
Actions #1

Updated by Greg Farnum about 4 years ago

  • Status changed from New to Fix Under Review
Actions #2

Updated by Greg Farnum about 4 years ago

  • Blocks Feature #44108: mon: osd: handle 2-(main-)site stretch clusters explicitly, so no admin intervention is needed when a DC dies added
Actions #3

Updated by Ken Dreyer about 4 years ago

  • Backport set to nautilus
Actions #4

Updated by Neha Ojha about 4 years ago

  • Priority changed from Normal to Urgent

Marking anything we need for octopus as "Urgent".

Actions #5

Updated by Greg Farnum about 2 years ago

  • Status changed from Fix Under Review to Resolved
  • Pull request ID changed from 32336 to 35906

Oh, this has been done for ages.

Actions

Also available in: Atom PDF