Project

General

Profile

Feature #50919

Have teuthology-lock check paddles DB for node job stats

Added by David Galloway almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:

Description

There are a few scenarios (a BMC being offline, for example) where a testnode can repeatedly cause every job it gets to die. See attached.

That's nuts. Could we have teuthology-lock (or whatever would be more appropriate.. dispatcher?) check a node's last X job stats and if they're DEAD (not FAIL), mark the node down and use a different one?

Screenshot at 2021-05-20 16-20-10.png View (169 KB) David Galloway, 05/20/2021 08:20 PM

Also available in: Atom PDF