Bug #9086
closedgit failures are causing workers to die, and jobs to be dropped
0%
Description
Stuff like this:
2014-08-11T15:13:53.930 CRITICAL:teuthology.worker:Uncaught exception Traceback (most recent call last): File "/var/lib/teuthworker/src/teuthology_master/virtualenv/bin/teuthology-worker", line 9, in <module> load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-worker')() File "/home/teuthworker/src/teuthology_master/scripts/worker.py", line 7, in main teuthology.worker.main(parse_args()) File "/home/teuthworker/src/teuthology_master/teuthology/worker.py", line 125, in main job_config['suite_path'] = fetch_qa_suite(suite_branch) File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 167, in fetch_qa_suite enforce_repo_state(qa_suite_url, dest_path, branch) File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 29, in enforce_repo_state clone_repo(repo_url, dest_path, branch) File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 71, in clone_repo raise RuntimeError("git clone failed!") RuntimeError: git clone failed!
Currently this will cause the job to be dropped, but still appear as 'queued' in paddles. Ugly.
I'm working on a retry mechanism now that should eliminate (or at least greatly reduce the frequency of) this problem.
Updated by Zack Cerza over 9 years ago
- Status changed from In Progress to 7
I pushed these commits to attempt a fix:
https://github.com/ceph/teuthology/commit/b25b095ff39caa1cab8af796b5a129973f02784d#diff-d41d8cd98f00b204e9800998ecf8427e
https://github.com/ceph/teuthology/commit/b25b095ff39caa1cab8af796b5a129973f02784d#diff-d41d8cd98f00b204e9800998ecf8427e
Looks like the fixes worked but I'll keep this open until I'm more confident.
Updated by Zack Cerza over 9 years ago
Those apparently didn't solve the problem, but this might have:
https://github.com/ceph/teuthology/commit/591b511fdc3d777bce741dfc2c6aadd78bb3fa44#diff-d41d8cd98f00b204e9800998ecf8427e