Project

General

Profile

Actions

Bug #9086

closed

git failures are causing workers to die, and jobs to be dropped

Added by Zack Cerza over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Stuff like this:

2014-08-11T15:13:53.930 CRITICAL:teuthology.worker:Uncaught exception
Traceback (most recent call last):
  File "/var/lib/teuthworker/src/teuthology_master/virtualenv/bin/teuthology-worker", line 9, in <module>
    load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-worker')()
  File "/home/teuthworker/src/teuthology_master/scripts/worker.py", line 7, in main
    teuthology.worker.main(parse_args())
  File "/home/teuthworker/src/teuthology_master/teuthology/worker.py", line 125, in main
    job_config['suite_path'] = fetch_qa_suite(suite_branch)
  File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 167, in fetch_qa_suite
    enforce_repo_state(qa_suite_url, dest_path, branch)
  File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 29, in enforce_repo_state
    clone_repo(repo_url, dest_path, branch)
  File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 71, in clone_repo
    raise RuntimeError("git clone failed!")
RuntimeError: git clone failed!

Currently this will cause the job to be dropped, but still appear as 'queued' in paddles. Ugly.

I'm working on a retry mechanism now that should eliminate (or at least greatly reduce the frequency of) this problem.

Actions #1

Updated by Zack Cerza over 9 years ago

  • Status changed from In Progress to 7
Actions #3

Updated by Zack Cerza over 9 years ago

  • Status changed from 7 to Resolved

I think we're good now.

Actions

Also available in: Atom PDF