Project

General

Profile

Feature #14445

Tests should fail if we unexpectedly lose our connections to a node

Added by Greg Farnum over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Core
Target version:
-
Start date:
01/20/2016
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:

Description

I've been seeing several tests recently where a node just disappears for some reason, but the test keeps running until the overall test timeout. (I think this is usually the result of a thrasher keeping state alive, and nothing throwing an error that can interrupt it?) This keeps machines locked to no purpose and wastes resources.

But in cases where we've lost an entire node it's extremely unlikely that the test will be able to make any progress, so we should just kill it. Presumably we can do some sort of trivial heartbeat or something and die if we stop getting them.

Also available in: Atom PDF