Revision 15 - History - HOWTO run integration and upgrade tests - Stable releases - Ceph

HOWTO run integration and upgrade tests » History » Revision 15

Revision 14 (Loïc Dachary, 05/01/2015 10:11 AM) → Revision 15/40 (Loïc Dachary, 05/04/2015 03:22 PM)

h3. Scheduling a suite 

 This requires access to a running teuthology cluster, for instance the cluster from the sepia lab. 

 * git clone https://github.com/ceph/teuthology/ 
 * cd teuthology 
 * ./bootstrap 
 * Test the suite with *--dry-run*, i.e. something like: 
 <pre> 
 ./virtualenv/bin/teuthology-suite --dry-run --filter-out btrfs,ext4 --priority 1000 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --owner loic@dachary.org --ceph giant 
 </pre> 
 * Review the jobs to be scheduled and if it matches what is expected, run the same command without the *--dry-run* 
 * Assuming the suite was run on the sepia lab, it will immediately show up at http://pulpito.ceph.com/ (for instance http://pulpito.ceph.com/loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi/). Note the *loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi* part of the path: it matches the run name displayed by teuthology-suite 

 The meaning of the teuthology-suite arguments are: 

 * *--suite*    a reference to https://github.com/ceph/ceph-qa-suite/tree/master/suites. For instance --suite rados means run all jobs at https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados 
 * *--suite-branch* a reference to the ceph-qa-suite branch to use. For instance --suite rados --suite-branch giant means run all jobs at https://github.com/ceph/ceph-qa-suite/tree/giant/suites/rados instead of https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados 
 * *--priority* is the the priority of the job in the queue (lower numbers are higher priority). By default it is 1000 and if nothing is urgent it should be used. If in the process of debugging a single job that fails, the priority 101 can be used so that it is scheduled faster. 
 * *--machine-type* is the kind of machine that will be provisionned to run the tests. Use *plana,burnupi,mira* for bare metal or *vps* for virtual machines. 
 * *--ceph* is the branch of the Ceph repository to use and defaults to master. For instance --ceph giant means use https://github.com/ceph/ceph/tree/giant/ instead of https://github.com/ceph/ceph/tree/master/ 
 * *--owner* will be used when locking the machines for the tests. It is useful for the system administrator when things go wrong: (s)he know who is using the machines and can get in touch, for instance to ask permission to cancel a run. 
 * *--email* when the run is complete an email will be sent to this address with a summary of the results. 

 h3. Scheduling a suite without error injection 

 In hammer "rados suite":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados generates more than a thousand jobs and it is too heavy for the purpose of validating a series of pull requests before they are merged into the stable release branch. The number of jobs can be reduced by: 

 * not test btrfs and ext4 so only xfs remains ( --filter-out btrfs,ext4 ) 
 * removing the thrash and msg-failures yaml files and only keep the workload that verifies a run produces the expected result (see for instance "this branch for the hammer rados suite":https://github.com/dachary/ceph-qa-suite/tree/wip-rados-no-thrash-hammer) 

 For instance, the following 
 <pre> 
 ./virtualenv/bin/teuthology-suite --dry-run --priority 1000 --suite rados --suite-branch hammer --machine-type plana,burnupi,mira --distro ubuntu --email abhishek.lekshmanan@gmail.com --owner abhishek.lekshmanan@gmail.com    --ceph hammer-backports 
 </pre> 
 generated 3224 jobs but the following 
 <pre> 
 git clone -b wip-rados-no-thrash-hammer https://github.com/dachary/ceph-qa-suite.git /tmp/ceph-qa-suite 
 ./virtualenv/bin/teuthology-suite --filter-out btrfs,ext4 --priority 1000 --suite rados --suite-dir /tmp/ceph-qa-suite --machine-type plana,burnupi,mira --distro ubuntu --email abhishek.lekshmanan@gmail.com --owner abhishek.lekshmanan@gmail.com    --ceph hammer-backports 
 </pre> 
 generated 122 jobs. 

 h3. Re-scheduling failed or dead jobs from an existing suite 

 * Ask https://github.com/ceph/paddles (the server in which suite runs are stored) about the *dead* jobs 
 <pre> 
 run=loic-2015-03-23_01:09:31-rados-giant---basic-multi 
 eval filter=$(curl --silent http://paddles.front.sepia.ceph.com/runs/$run/jobs/?status=dead | jq '.[].description' | while read description ; do echo -n $description, ; done | sed -e 's/,$//') 
 </pre> 
 or the *fail* jobs: 
 <pre> 
 run=loic-2015-03-23_01:09:31-rados-giant---basic-multi 
 eval filter=$(curl --silent http://paddles.front.sepia.ceph.com/runs/$run/jobs/?status=fail | jq '.[].description' | while read description ; do echo -n $description, ; done | sed -e 's/,$//') 
 </pre> 
 * Re-run the suite using the same command line without *--filter-out* and with *--filter "$filter"* to only schedule the jobs described in the *fitler* variable 
 <pre> 
 ./virtualenv/bin/teuthology-suite --filter "$filter" --priority 1000 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --owner loic@dachary.org --ceph giant 
 </pre> 

 h3. Killing a suite 

 * if the run is scheduled but did not start yet: 
 ** if the suite was scheduled with *--machine-type plana,burnupi,mira*: 
 <pre> 
 ./virtualenv/bin/teuthology-kill -m multi -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer  
 </pre> 
 ** if the suite was scheduled with *--machine-type vps* 
 <pre> 
 ./virtualenv/bin/teuthology-kill -m vps -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer 
 </pre> 

 * if the run already started the *-m* option is not necessary 
 <pre> 
 ./virtualenv/bin/teuthology-kill -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer 
 </pre> 

 h3. Integration suites 

 Expected to be successfully run on the integration branch before "asking approval to the leads":http://ceph.com/docs/master/dev/development-workflow/#resolving-bug-reports-and-implementing-features (hence before asking QE to tests further) 

 "rados":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados 
 "rgw":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rgw 
 "rbd":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rbd 
 "fs":https://github.com/ceph/ceph-qa-suite/tree/master/suites/fs 

 h3. QE suites 

 Expected to be successfully run on the release branch before "passing it to the person publishing the release":http://ceph.com/docs/master/dev/development-workflow/#cutting-a-new-stable-release. 

 h4. dumpling 

 rados 
 rbd 
 rgw 
 fs 
 ceph-deploy 
 upgrade/dumpling 

 h4. firefly 

 rados 
 rbd 
 rgw 
 fs 
 krbd 
 kcephfs 
 samba 
 ceph-deploy 
 upgrade/firefly 
 upgrade/dumpling-firefly-x (to giant) 
 powercycle  

 h4. giant 

 rados 
 rbd 
 rgw 
 fs 
 krbd 
 kcephfs 
 knfs 
 haddop 
 samba 
 rest 
 multimds 
 multi-version 
 upgrade/giant 
 powecycle

Project

General

Profile

Ceph » Stable releases

HOWTO run integration and upgrade tests » History » Revision 15