Project

General

Profile

Actions

Feature #14445

open

Tests should fail if we unexpectedly lose our connections to a node

Added by Greg Farnum about 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Core
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:

Description

I've been seeing several tests recently where a node just disappears for some reason, but the test keeps running until the overall test timeout. (I think this is usually the result of a thrasher keeping state alive, and nothing throwing an error that can interrupt it?) This keeps machines locked to no purpose and wastes resources.

But in cases where we've lost an entire node it's extremely unlikely that the test will be able to make any progress, so we should just kill it. Presumably we can do some sort of trivial heartbeat or something and die if we stop getting them.

No data to display

Actions

Also available in: Atom PDF