Project

General

Profile

Actions

Feature #8356

open

tests are being marked as hung despite never actually getting machines locked

Added by Greg Farnum almost 10 years ago. Updated almost 10 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:

Description

teuthology-2014-05-12_23:02:18-knfs-master-testing-basic-plana/251991, for instance. I've seen this a couple other times in emails this week and will update the ticket with any more that I see.

I'm not sure if the root problem here is that the test is never getting its machines (do we just have too many runs racing against each other for unlocked machines?), or that it's getting declared as hung too early, or what.

Actions #1

Updated by Greg Farnum almost 10 years ago

  • Description updated (diff)
Actions #2

Updated by Greg Farnum almost 10 years ago

Hmm, it may have only been marked as hung after somebody killed the run yesterday. But it wasn't killed until 6 hours after its peers finished, so there's definitely some kind of issue here.

Actions #3

Updated by Zack Cerza almost 10 years ago

So the way the emails have always worked is pretty wonky.

When a test run containing X number of jobs is scheduled, X+1 are actually created in beanstalkd. The last one doesn't contain any tests and has the flag last_in_suite = True. When the worker picks up that job, it kicks off a teuthology-results process which:

  1. Looks for subdirs of the archive dir to see which jobs exist
  2. For each subdir, looks for a summary.yaml inside it.
  3. If that is not present, it assumes the job is running
  4. If there are running jobs, waits a certain amount of time (default 6h)
  5. Decides that any running job is hung
  6. Sends the email

I'd be happy to put some time into making that less, uh, crazy.

Actions #4

Updated by Zack Cerza almost 10 years ago

  • Tracker changed from Bug to Feature
  • Translation missing: en.field_story_points set to 4.0

Marking this as a feature so I can budget time to work on it. I have not decided how exactly I'll do it, though.

Actions

Also available in: Atom PDF