Actions
Bug #9086
closedgit failures are causing workers to die, and jobs to be dropped
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
Stuff like this:
2014-08-11T15:13:53.930 CRITICAL:teuthology.worker:Uncaught exception Traceback (most recent call last): File "/var/lib/teuthworker/src/teuthology_master/virtualenv/bin/teuthology-worker", line 9, in <module> load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-worker')() File "/home/teuthworker/src/teuthology_master/scripts/worker.py", line 7, in main teuthology.worker.main(parse_args()) File "/home/teuthworker/src/teuthology_master/teuthology/worker.py", line 125, in main job_config['suite_path'] = fetch_qa_suite(suite_branch) File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 167, in fetch_qa_suite enforce_repo_state(qa_suite_url, dest_path, branch) File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 29, in enforce_repo_state clone_repo(repo_url, dest_path, branch) File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 71, in clone_repo raise RuntimeError("git clone failed!") RuntimeError: git clone failed!
Currently this will cause the job to be dropped, but still appear as 'queued' in paddles. Ugly.
I'm working on a retry mechanism now that should eliminate (or at least greatly reduce the frequency of) this problem.
Actions