Project

General

Profile

Actions

Bug #6546

closed

Race condition between tests starting and teuthology-results being run

Added by Zack Cerza over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-11_23:01:29-kcephfs-master-testing-basic-plana/47502/

2013-10-14T10:25:52.973 INFO:Waiting up to 36000 seconds for tests to finish...
2013-10-14T11:59:37.565 INFO:Tests finished! gathering results...
[...]
Hung =================================================================
[47502] kcephfs/thrash/{clusters/fixed-3.yaml fs/btrfs.yaml thrashers/default.yaml workloads/kclient_workunit_suites_ffsb.yaml}

47502 had passed. Notice only ~90min had passed, not nearly 36000 seconds. IRC log:

15:29 < zackc> gregaf: ah! i found the bug!
15:30 < zackc> gregaf: so, teuthology-results pokes around in the archive dir to see
which subdirs don't contain a summary.yaml - when they all do, it
considers the run finished
15:31 < zackc> eventually it'll time out, but that isn't what happened here
15:32 < zackc> unfortunately teuthology-results' list of running jobs is only created
once - when it starts running
15:32 < zackc> it never looks at the tree again
15:32 < zackc> so, if a job starts after that happens, it won't check to see if it
finished
15:33 < zackc> but! when it goes to assemble the results, it does look at them all again
15:33 < gregaf> haha, nice
15:33 < gregaf> so there's a race between starting the jobs and starting
teuthology-results?
15:33 < zackc> the fix might be as simple as moving the "which jobs exist" check to be
inside the "which jobs are still running" loop
15:34 < zackc> apparently

Actions #1

Updated by Zack Cerza over 10 years ago

  • Status changed from In Progress to Fix Under Review
Actions #2

Updated by Zack Cerza over 10 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF