Project

General

Profile

Actions

Feature #11443

open

Elector: throttle election attempts from DoSing peers

Added by Greg Farnum about 9 years ago. Updated over 8 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

In at least one cluster, we've seen a situation where a badly-behaved monitor can DoS the entire cluster: it continually calls elections through one mechanism for another but can't accept any writes, and so the healthy quorum members are unable to make forward progress.
(Due to the nature of the failure, we don't have very helpful logs. For the purposes of this discussion, let's assume that the monitor was asserting out on every failed write, but then getting restarted by its init system ever time. The precise nature of the failure shouldn't actually matter — similar failure modes are possible if the network can't maintain a stable connection, etc)

We want to prevent this misbehaving monitor from disrupting the rest of the cluster. That leads me to think each monitor need some kind of registry of "disappointing" peers: those who were elected leader but failed to ack in time, or those who participated in an election but then timed out on a commit or ratification. Perhaps even those who participate in one election and then start another one that we locally don't think needs to be called.

The difficulty of course is that we can't blacklist them forever, and we need to let them back in once the administrator has resolved the issue. Even worse, monitors always set their nonce values to zero, so we can't identify new daemon instances or anything — only addresses and the epoch values they claim to have. So do we just do some kind of decay on how disappointing we find each peer? Do we prevent them from starting the next election, but let them participate in one that somebody else begins?

Actions

Also available in: Atom PDF