Support #20621: Investigate reimaging testnodes after every job - sepia - Ceph

Actions

Copy link

Support #20621

closed

Investigate reimaging testnodes after every job

Added by David Galloway almost 7 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

David Galloway

Category:

Infrastructure Service

Target version:

% Done:

Tags:

Reviewed:

Affected Versions:

Description

It has been suggested (again) that we spend some time researching reimaging testnodes after every job. I'm going to do some research and see what can be done to get reimaging done as fast as possible.

Some ideas:

Revisit edeploy
Either bake or find a bare minimum OS install of each distro and let ceph-cm-ansible do most of the heavy lifting
Tweaks to kickstarts
@minimal instead of @base package group for EL
https://xcat-docs.readthedocs.io/en/stable/advanced/sysclone/sysclone.html or similar solution
After reimage or even during teuthology jobs, run ceph-cm-ansible playbooks on localhost (the testnode) instead of from Cobbler or teuthology.front.
Make sure NIC boot order is set on each testnode to only try booting from the cabled NIC (would save about 5-10sec since that's how long it takes PXE to time out). This would also have the added benefit of saving time when the node is rebooted during a job
Openstack's Ironic basically does what edeploy does -- boots a tiny linux image into memory and DD's an OS image onto the drive via nova-baremetal-agent

Actions

Copy link

Updated by David Galloway almost 7 years ago

Some current statistics:

It takes about 10 minutes to run ceph-cm-ansible on a fresh VPS and about 8 minutes on an OVH node. This is because most of the entire playbook has be run. We could slim this down by baking our own cloud images.

It takes about 2 minutes to run ceph-cm-ansible on an already provisioned smithi.
It takes about 4 minutes to run ceph-cm-ansible on an already provisioned mira.

Actions

Copy link

Updated by David Galloway almost 7 years ago

Description updated (diff)

Actions

Copy link

Updated by David Galloway almost 7 years ago

Description updated (diff)

Actions

Copy link

Updated by David Galloway almost 7 years ago

Changed install location in Ubuntu preseed from archive.ubuntu.com to use apt-mirror instead.

d-i mirror/http/hostname string http://apt-mirror.front.sepia.ceph.com/archive.ubuntu.com

Tested Xenial install on smithis.

archive.ubuntu.com install took: 30 minutes, 48 sec
apt-mirror install took: 28 minutes, 1 sec

From reboot to ceph-cm-ansible completion.

Actions

Copy link

Updated by David Galloway almost 7 years ago

Description updated (diff)

Actions

Copy link

Updated by David Galloway almost 7 years ago

Adding d-i base-installer/install-recommends boolean false to the Ubuntu preseed cut install time down to 20 minutes, 42sec.

It takes cobbler 10min 25sec to run the testnodes playbook and 1min 21sec to add users and keys without this modification. The playbook run failed with the modification so I'm guessing a repo or package is missing. I didn't bother digging into this any further since 20 minutes is still way too long.

Actions

Copy link

Updated by David Galloway almost 7 years ago

Running ceph-cm-ansible locally from/on a testnode took (and failed) 8min 29sec so not really saving much time.

At this point I think our best bet is to build our own OS images with ceph-cm-ansible changes already applied to them then figure out how to install them.

Actions

Copy link

Updated by David Galloway over 6 years ago

I've got reimaging a smithi with a trusty image that has ceph-cm-ansible already ran against it down to 5min 9sec using FOG.

Running the testnodes role against that smithi from the teuthology machine takes 0:02:32.856 but the amount of tags/tasks we run against a freshly provisioned node can be dramatically reduced.

I think we'd basically just need to set the hostname, fix /etc/hosts and partition non-root disks.

Actions

Copy link

Updated by David Galloway over 6 years ago

Status changed from In Progress to Resolved

This is being accomplished with FOG

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Infrastructure » sepia

Custom queries

Support #20621

Investigate reimaging testnodes after every job

Updated by David Galloway almost 7 years ago

Updated by David Galloway almost 7 years ago

Updated by David Galloway almost 7 years ago

Updated by David Galloway almost 7 years ago

Updated by David Galloway almost 7 years ago

Updated by David Galloway almost 7 years ago

Updated by David Galloway almost 7 years ago

Updated by David Galloway over 6 years ago

Updated by David Galloway over 6 years ago