HOWTO run integration and upgrade tests - Stable releases - Ceph

Actions

History

Scheduling a suite¶

This requires access to a running teuthology cluster, for instance the cluster from the sepia lab (with Sepia VPN running, you can do something like ssh teuthology.front.sepia.ceph.com ).

git clone https://github.com/ceph/teuthology/
cd teuthology
./bootstrap

Test the suite with --dry-run, i.e. something like:

./virtualenv/bin/teuthology-suite --dry-run --priority 1000 --subset 1/18 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant

Review the jobs to be scheduled and if it matches what is expected, run the same command without the --dry-run
Assuming the suite was run on the sepia lab, it will immediately show up at http://pulpito.ceph.com/ (for instance http://pulpito.ceph.com/loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi/). Note the loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi part of the path: it matches the run name displayed by teuthology-suite

The meaning of the teuthology-suite arguments are:

--suite a reference to https://github.com/ceph/ceph-qa-suite/tree/master/suites. For instance --suite rados means run all jobs at https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
--suite-branch a reference to the ceph-qa-suite branch to use. For instance --suite rados --suite-branch giant means run all jobs at https://github.com/ceph/ceph-qa-suite/tree/giant/suites/rados instead of https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
--priority is the the priority of the job in the queue (lower numbers are higher priority). By default it is 1000 and if nothing is urgent it should be used. If in the process of debugging a single job that fails, the priority 101 can be used so that it is scheduled faster.
--machine-type is the kind of machine that will be provisionned to run the tests. Use plana,burnupi,mira for bare metal or vps for virtual machines.
--ceph is the branch of the Ceph repository to use and defaults to master. For instance --ceph giant means use https://github.com/ceph/ceph/tree/giant/ instead of https://github.com/ceph/ceph/tree/master/
--email when the run is complete an email will be sent to this address with a summary of the results
--kernel controls which kernel is installed on the testing machines - can be either "distro" or "testing" - always use --kernel distro to ensure all jobs have the same kernel installed
--subset X/N will reduce the number of jobs to 1/N of the total number of jobs, as long as it includes all yaml (facets). For instance, in a given rados suite --subset 0/18 will always create the same jobs. By running 0/18, then 1/18, etc. up to 17/18 all jobs are generated. It is a good idea to not always pick the same subset to get various combinations and increase the odds of discovering a problem. As of may 2015, it is recommended to use X/18 for rados which is by far the largest suite (3000+ jobs). Since all other suites are smaller, X/18 is also a good choice for all of them.

Using the --subset option¶

The goal of integration tests is to verify a set of commits won't create problems that could be easily detected by running the teuthology suites. If the --subset is not used, the tests being run will be the same as if --subset is used, only combined differently. Running all combinations of tests is useful to detect subtle bugs, but it's not the focus of the integration tests. It is therefore advisable to always use the --subset option, regardless of the suite being run, to reduce the number of tests being scheduled and speed up their completion.

Regarding rados suite integration testing runs, in August 2016 it was agreed with Samuel Just (rados lead) that we will use such a --subset value that yields approximately 250 jobs:

for hammer a reasonable (estimated) value is --subset $(expr $RANDOM % 20)/20
for jewel a reasonable (estimated) value is --subset $(expr $RANDOM % 50)/50
for kraken a reasonable (estimated) value is --subset $(expr $RANDOM % 75)/75

Re-scheduling failed or dead jobs from an existing suite¶

Ask https://github.com/ceph/paddles (the server in which suite runs are stored) about the dead or fail jobs

run=loic-2015-03-23_01:09:31-rados-giant---basic-multi
eval filter=$(curl -L --silent http://paddles.front.sepia.ceph.com/runs/$run/ | jq '.jobs[] | select(.status == "dead" or .status == "fail") | .description' | while read description ; do echo -n $description, ; done | sed -e 's/,$//')

Re-run the suite using the same command line without --filter-out and with --filter="$filter" to only schedule the jobs described in the fitler variable

./virtualenv/bin/teuthology-suite --filter="$filter" --priority 1000 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant

Scheduling jobs missed in an existing suite.¶

In case some network problem occurs when you're scheduling a large suite, instead of killing and rescheduling the entire suite, only the missed jobs can be scheduled as

run=http://pulpito.ceph.com/abhi-2015-06-20_09:01:42-rados-hammer-backports---basic-multi/
eval filter=$(curl --silent http://paddles.front.sepia.ceph.com/runs/$run/ | jq '.jobs[] | .description' | while read description ; do echo -n $description, ; done | sed -e 's/,$//')
./virtualenv/bin/teuthology-suite --filter-out="$filter" --priority 1000 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant

Killing a suite¶

if the run is scheduled but did not start yet:

if the suite was scheduled with --machine-type plana,burnupi,mira:

./virtualenv/bin/teuthology-kill -m multi -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer

if the suite was scheduled with --machine-type vps

./virtualenv/bin/teuthology-kill -m vps -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer

if the run already started the -m option is not necessary

./virtualenv/bin/teuthology-kill -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer

NOTE: allow some time for the operation to complete.

Integration suites¶

Expected to be successfully run on the integration branch before asking approval to the leads (hence before asking QE to tests further)

rados
rgw
rbd
fs
samba
upgrade (with vps to cover all supported operating systems)
powercycle (only run 2 for the sake of verifying the code path for obvious mistakes)
ceph-deploy (to verify packaging because it uses packages to install ceph)

QE suites¶

Expected to be successfully run on the release branch before passing it to the person publishing the release.

dumpling¶

rados
rbd
rgw
fs
ceph-deploy
upgrade/dumpling

firefly¶

rados
rbd
rgw
fs
krbd
kcephfs
samba
ceph-deploy
upgrade/firefly
upgrade/dumpling-firefly-x (to giant)
powercycle

giant¶

rados
rbd
rgw
fs
krbd
kcephfs
knfs
haddop
samba
rest
multimds
multi-version
upgrade/giant
powecycle

Files (0)

Updated by Nathan Cutler about 7 years ago · 40 revisions

Project

General

Profile

Ceph » Stable releases

Wiki