Actions
Feature #14445
openTests should fail if we unexpectedly lose our connections to a node
Status:
New
Priority:
Normal
Assignee:
-
Category:
Core
% Done:
0%
Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Description
I've been seeing several tests recently where a node just disappears for some reason, but the test keeps running until the overall test timeout. (I think this is usually the result of a thrasher keeping state alive, and nothing throwing an error that can interrupt it?) This keeps machines locked to no purpose and wastes resources.
But in cases where we've lost an entire node it's extremely unlikely that the test will be able to make any progress, so we should just kill it. Presumably we can do some sort of trivial heartbeat or something and die if we stop getting them.
No data to display
Actions