Project

General

Profile

Actions

Bug #42110

open

teuthology does not unlock node if libcloud failed to connect service

Added by Kyrylo Shatskyy over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Core
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Last time I've seen it happens when dns does not work due to some circumstances for example, on newly setup teuthology server.
Each time it make an attempt to connect it fails, throws exception, does not unlock the machine and tries to lock another. It continues to lock machines until the required nodes count exhausted for the job.

Probable place to look into:

if machine_type in vm_types:
    ok_machs = {}
    for machine in machines:
        if teuthology.provision.create_if_vm(ctx, machine):
            ok_machs[machine] = machines[machine]
        else:
            log.error('Unable to create virtual machine: %s',
                      machine)
            unlock_one(ctx, machine, user)
        ok_machs = keys.do_update_keys(ok_machs.keys())[1]
    return ok_machs

the create_if_vm returns false because of:

2019-03-29T14:26:59.473 DEBUG:teuthology.provision.cloud.openstack:Creating node: OpenStackProvisioner(provider='ecp', name='target-mainstream-010', os_type='ubuntu', os_version='16.04')
2019-03-29T14:26:59.509 ERROR:teuthology.provision.cloud.base:Failed to create target-mainstream-010
Traceback (most recent call last):
File "/home/worker/src/teuthology_master/teuthology/provision/cloud/base.py", line 55, in create
return self._create()
File "/home/worker/src/teuthology_master/teuthology/provision/cloud/openstack.py", line 208, in _create
log.debug("Selected size: %s", self.size)
File "/home/worker/src/teuthology_master/teuthology/provision/cloud/openstack.py", line 344, in size
all_sizes = self.provider.sizes
File "/home/worker/src/teuthology_master/teuthology/provision/cloud/openstack.py", line 101, in sizes
sizes = retry(self.driver.list_sizes)
File "/home/worker/src/teuthology_master/teuthology/provision/cloud/openstack.py", line 39, in retry
result = function(*args, **kwargs)
File "/home/worker/src/teuthology_master/virtualenv/lib/python2.7/site-packages/libcloud/compute/drivers/openstack.py", line 318, in list_sizes
self.connection.request('/flavors/detail').object)
File "/home/worker/src/teuthology_master/virtualenv/lib/python2.7/site-packages/libcloud/common/openstack.py", line 224, in request
raw=raw)
File "/home/worker/src/teuthology_master/virtualenv/lib/python2.7/site-packages/libcloud/common/base.py", line 603, in request
headers=headers, stream=stream)
File "/home/worker/src/teuthology_master/virtualenv/lib/python2.7/site-packages/libcloud/http.py", line 221, in request
verify=self.verification
File "/home/worker/src/teuthology_master/virtualenv/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/home/worker/src/teuthology_master/virtualenv/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/home/worker/src/teuthology_master/virtualenv/lib/python2.7/site-packages/requests/adapters.py", line 497, in send
raise SSLError(e, request=request)
SSLError: ("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",)
2019-03-29T14:26:59.512 ERROR:teuthology.lock.ops:Unable to create virtual machine:

since no operation to openstack available and generates exceptions, even simple get flavor list.
later unlock_one is trying to delete a node, and of course fails and returns false, so unlock does not happen.
Probably this change broke the unlock functionality:

commit 77185fdd2d19b286263a6c3ad24aaff7e9208715
Author: Zack Cerza <>
Date: Tue Jan 10 12:47:21 2017 -0700 Don't unlock VMs when destroy fails

Signed-off-by: Zack Cerza &lt;&gt;diff --git a/teuthology/lock/ops.py b/teuthology/lock/ops.py
index 41a2d1cb..64e1427e 100644
--- a/teuthology/lock/ops.py
++ b/teuthology/lock/ops.py
@ -163,6 +163,7 @ def unlock_one(ctx, name, user, description=None):
name = misc.canonicalize_hostname(name, user=None)
if not teuthology.provision.destroy_if_vm(ctx, name, user, description):
log.error('destroy failed for %s', name)
return False
request = dict(name=name, locked=False, locked_by=user,
description=description)
uri = os.path.join(config.lock_server, 'nodes', name, 'lock', '')

No data to display

Actions

Also available in: Atom PDF