Support #20621
closedInvestigate reimaging testnodes after every job
0%
Description
It has been suggested (again) that we spend some time researching reimaging testnodes after every job. I'm going to do some research and see what can be done to get reimaging done as fast as possible.
Some ideas:- Revisit edeploy
- Either bake or find a bare minimum OS install of each distro and let ceph-cm-ansible do most of the heavy lifting
- Tweaks to kickstarts
- @minimal instead of @base package group for EL
- https://xcat-docs.readthedocs.io/en/stable/advanced/sysclone/sysclone.html or similar solution
- After reimage or even during teuthology jobs, run ceph-cm-ansible playbooks on localhost (the testnode) instead of from Cobbler or teuthology.front.
- Make sure NIC boot order is set on each testnode to only try booting from the cabled NIC (would save about 5-10sec since that's how long it takes PXE to time out). This would also have the added benefit of saving time when the node is rebooted during a job
- Openstack's Ironic basically does what edeploy does -- boots a tiny linux image into memory and DD's an OS image onto the drive via
nova-baremetal-agent
Updated by David Galloway almost 7 years ago
Some current statistics:
It takes about 10 minutes to run ceph-cm-ansible on a fresh VPS and about 8 minutes on an OVH node. This is because most of the entire playbook has be run. We could slim this down by baking our own cloud images.
It takes about 2 minutes to run ceph-cm-ansible on an already provisioned smithi.
It takes about 4 minutes to run ceph-cm-ansible on an already provisioned mira.
Updated by David Galloway almost 7 years ago
Changed install location in Ubuntu preseed from archive.ubuntu.com to use apt-mirror instead.
d-i mirror/http/hostname string http://apt-mirror.front.sepia.ceph.com/archive.ubuntu.com
Tested Xenial install on smithis.
archive.ubuntu.com install took: 30 minutes, 48 sec
apt-mirror install took: 28 minutes, 1 sec
From reboot to ceph-cm-ansible completion.
Updated by David Galloway almost 7 years ago
Adding d-i base-installer/install-recommends boolean false
to the Ubuntu preseed cut install time down to 20 minutes, 42sec.
It takes cobbler 10min 25sec to run the testnodes playbook and 1min 21sec to add users and keys without this modification. The playbook run failed with the modification so I'm guessing a repo or package is missing. I didn't bother digging into this any further since 20 minutes is still way too long.
Updated by David Galloway almost 7 years ago
Running ceph-cm-ansible locally from/on a testnode took (and failed) 8min 29sec so not really saving much time.
At this point I think our best bet is to build our own OS images with ceph-cm-ansible changes already applied to them then figure out how to install them.
Updated by David Galloway over 6 years ago
I've got reimaging a smithi with a trusty image that has ceph-cm-ansible already ran against it down to 5min 9sec using FOG.
Running the testnodes role against that smithi from the teuthology machine takes 0:02:32.856 but the amount of tags/tasks we run against a freshly provisioned node can be dramatically reduced.
I think we'd basically just need to set the hostname, fix /etc/hosts
and partition non-root disks.
Updated by David Galloway over 6 years ago
- Status changed from In Progress to Resolved
This is being accomplished with FOG