Bug #6546
closedRace condition between tests starting and teuthology-results being run
0%
Description
2013-10-14T10:25:52.973 INFO:Waiting up to 36000 seconds for tests to finish...
2013-10-14T11:59:37.565 INFO:Tests finished! gathering results...
[...]
Hung
=================================================================
[47502] kcephfs/thrash/{clusters/fixed-3.yaml fs/btrfs.yaml thrashers/default.yaml workloads/kclient_workunit_suites_ffsb.yaml}
47502 had passed. Notice only ~90min had passed, not nearly 36000 seconds. IRC log:
15:29 < zackc> gregaf: ah! i found the bug!
15:30 < zackc> gregaf: so, teuthology-results pokes around in the archive dir to see
which subdirs don't contain a summary.yaml - when they all do, it
considers the run finished
15:31 < zackc> eventually it'll time out, but that isn't what happened here
15:32 < zackc> unfortunately teuthology-results' list of running jobs is only created
once - when it starts running
15:32 < zackc> it never looks at the tree again
15:32 < zackc> so, if a job starts after that happens, it won't check to see if it
finished
15:33 < zackc> but! when it goes to assemble the results, it does look at them all again
15:33 < gregaf> haha, nice
15:33 < gregaf> so there's a race between starting the jobs and starting
teuthology-results?
15:33 < zackc> the fix might be as simple as moving the "which jobs exist" check to be
inside the "which jobs are still running" loop
15:34 < zackc> apparently
Updated by Zack Cerza over 10 years ago
- Status changed from In Progress to Fix Under Review
Updated by Zack Cerza over 10 years ago
- Status changed from Fix Under Review to Resolved