Project

General

Profile

Actions

Support #17952

closed

handling OpenStack provider failure to provision host

Added by Loïc Dachary over 7 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:

Description

When provisioning a host fails because no host was found for the required flavor, a specific error message is displayed:

OS_REGION_NAME=GRA1 openstack --debug server create --image 'Ubuntu 16.04' --flavor hg-60-ssd --key-name loic --wait test60
...
RESP BODY: {"server": {"status": "ERROR", "updated": "2016-11-18T09:11:06Z", "hostId": "", "addresses": {}, "links": [{"href": "http://compute.gra1.cloud.ovh.net/v2/131b886b156a4f84b5f41baf2fbe646c/servers/b63e51c3-279a-4f81-82e5-68bc4d853095", "rel": "self"}, {"href": "http://compute.gra1.cloud.ovh.net/131b886b156a4f84b5f41baf2fbe646c/servers/b63e51c3-279a-4f81-82e5-68bc4d853095", "rel": "bookmark"}], "key_name": "loic", "image": {"id": "d79802bf-0b36-47a4-acb6-76a293b0c037", "links": [{"href": "http://compute.gra1.cloud.ovh.net/131b886b156a4f84b5f41baf2fbe646c/images/d79802bf-0b36-47a4-acb6-76a293b0c037", "rel": "bookmark"}]}, "OS-EXT-STS:task_state": null, "OS-EXT-STS:vm_state": "error", "OS-SRV-USG:launched_at": null, "flavor": {"id": "229eabf4-ada1-49ee-8bde-f1c574786fd3", "links": [{"href": "http://compute.gra1.cloud.ovh.net/131b886b156a4f84b5f41baf2fbe646c/flavors/229eabf4-ada1-49ee-8bde-f1c574786fd3", "rel": "bookmark"}]}, "id": "b63e51c3-279a-4f81-82e5-68bc4d853095", "OS-SRV-USG:terminated_at": null, "OS-EXT-AZ:availability_zone": "nova", "user_id": "291dde1633154837be2693c6ffa6315c", "name": "test60", "created": "2016-11-18T09:10:52Z", "tenant_id": "131b886b156a4f84b5f41baf2fbe646c", "OS-DCF:diskConfig": "MANUAL", "os-extended-volumes:volumes_attached": [], "accessIPv4": "", "accessIPv6": "", "fault": {"message": "No valid host was found. ", "code": 500, "created": "2016-11-18T09:11:06Z"}, "OS-EXT-STS:power_state": 0, "config_drive": "", "metadata": {}}}

GET call to compute for https://compute.gra1.cloud.ovh.net/v2/131b886b156a4f84b5f41baf2fbe646c/servers/b63e51c3-279a-4f81-82e5-68bc4d853095 used request id req-df08e68a-ac02-4d3d-997f-ea6b6b4aed46
Error creating server: test60
Error creating server
END return value: 1

The provisioning part of OpenStack should create a file ( /var/run/teuthology/ovh-GRA1-capacity.error for instance) when No valid host was found is seen as the reason for not being able to provision a host. A cron job could then run every 5 min and kill all teuthology jobs if this file is found and remove it. It is a system wide problem and should not be dealt with by the process that discovers it but by another process that is carefully trusted to do that kind of global destruction.

When the problem is fixed on the OpenStack provider side, the sysadmin can then re-launch teuthology, even if it means launching thousands of jobs. In the worst case scenario a few hosts will be created, the problem will show up again and within 5mn the whole cluster will be shut down again. This avoids the problem of running a cluster that does nothing but failing while creating hosts that end up being useless and cost a significant amount of money.

Actions #1

Updated by Kyrylo Shatskyy almost 4 years ago

  • Status changed from New to Closed

Closing as obsolete

Actions

Also available in: Atom PDF