Project

General

Profile

Actions

Feature #9846

open

don't wait forever if machines are not available

Added by Alfredo Deza over 9 years ago. Updated over 9 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:

Description

At some point teuthology should just give up.

The following job was holding a run for two days:

http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-17_01:10:01-ceph-deploy-firefly-distro-basic-vps/552797/teuthology.log

With output like:


2014-10-19T12:54:30.721 INFO:teuthology.task.internal:virtual machine is still unavailable
2014-10-19T12:54:43.773 ERROR:teuthology.lock:read (vpm175.front.sepia.ceph.com): No route to host

2014-10-19T12:54:43.773 INFO:teuthology.task.internal:virtual machine is still unavailable
2014-10-19T12:54:56.817 ERROR:teuthology.lock:read (vpm175.front.sepia.ceph.com): No route to host
Actions #1

Updated by Zack Cerza over 9 years ago

I agree, that's lame. How long should we wait? An hour?

Actions #2

Updated by Sandon Van Ness over 9 years ago

  • Tracker changed from Bug to Feature

This should only happen if something is pretty seriously borked. We don't really want it to give up soon unless it markes the machine down so tests that run after it don't grab the same borked machine. Changing this to feature as that would require an additional feature of teuthology.

Actions

Also available in: Atom PDF