Feature #11443

Elector: throttle election attempts from DoSing peers

Added by Greg Farnum over 4 years ago. Updated almost 4 years ago.

Target version:
Start date:
Due date:
% Done:


Affected Versions:
Pull request ID:


In at least one cluster, we've seen a situation where a badly-behaved monitor can DoS the entire cluster: it continually calls elections through one mechanism for another but can't accept any writes, and so the healthy quorum members are unable to make forward progress.
(Due to the nature of the failure, we don't have very helpful logs. For the purposes of this discussion, let's assume that the monitor was asserting out on every failed write, but then getting restarted by its init system ever time. The precise nature of the failure shouldn't actually matter — similar failure modes are possible if the network can't maintain a stable connection, etc)

We want to prevent this misbehaving monitor from disrupting the rest of the cluster. That leads me to think each monitor need some kind of registry of "disappointing" peers: those who were elected leader but failed to ack in time, or those who participated in an election but then timed out on a commit or ratification. Perhaps even those who participate in one election and then start another one that we locally don't think needs to be called.

The difficulty of course is that we can't blacklist them forever, and we need to let them back in once the administrator has resolved the issue. Even worse, monitors always set their nonce values to zero, so we can't identify new daemon instances or anything — only addresses and the epoch values they claim to have. So do we just do some kind of decay on how disappointing we find each peer? Do we prevent them from starting the next election, but let them participate in one that somebody else begins?


#1 Updated by Sage Weil almost 4 years ago

What about a model like

void behaved()
void disappointed()
bool should_ignore()

with a couple exponential decay tunables (initial backoff = 30 seconds, backoff multiplier = 2). If it does something bad, start ignoring for backoff seconds and then double backoff for next time. If does something good, we reset the backoff duration to the initial value.

Bad things would be:
- they were the leader and we were forced to call an election
- they called an election and then failed to complete it

Good things would be:
- joined a quorum (hrm, maybe not enough...)
- committed a value (as leader or as peon)

I think the delicate part is getting the good and bad behaviors right..

#2 Updated by Sage Weil almost 4 years ago

Trivial workaround here is backporting the respawn thresholds for upstart?

#4 Updated by Greg Farnum almost 4 years ago

Yes, getting the good and bad behaviors right is hard. Remember this needs to be something that each monitor can track with only local information, but that will still lead to a healthy quorum if one is possible.

For instance, we can't ignore a monitor that fails to complete an election: it might be healthy but not have gotten enough acknowledgements from peers. Blocking it leads to an easy ongoing election failure when booting monitors slowly.

I think we'd need to do something like:
1) Only blacklist a monitor if we can, ourselves, detect an actual failure from them. For leaders: failure to extend paxos leases (or to ack a known-good victory? Is there somewhere in the stage we can know that's happened?). For peons, failure to ack a commit.
2) Ignore election start messages from blacklisted monitors.
3) When voting in an election, exclude blacklisted monitors from the set of potential leaders.
3b) If a blacklisted monitor becomes leader (without us, because we didn't vote for them!) and we are not in quorum, try to join quorum and leave them in the voting set, but do not remove them from the blacklist.

I think something like that would let the cluster converge on agreement, but somebody would need to sketch it out pretty thoroughly. Obviously each blacklisting event would need to come with a decay.

#5 Updated by Kefu Chai almost 4 years ago


backporting the respawn thresholds for upstart helps with some cases, but not the case where we originally spotted the DoS, where a monitor was just slow but did not crash or restart itself.

Also available in: Atom PDF