Feature #50919
Have teuthology-lock check paddles DB for node job stats
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
% Done:
0%
Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Description
There are a few scenarios (a BMC being offline, for example) where a testnode can repeatedly cause every job it gets to die. See attached.
That's nuts. Could we have teuthology-lock
(or whatever would be more appropriate.. dispatcher?) check a node's last X job stats and if they're DEAD (not FAIL), mark the node down and use a different one?