Project

General

Profile

HOWTO run integration and upgrade tests » History » Version 20

Loïc Dachary, 05/22/2015 09:59 AM

1 3 Loïc Dachary
h3. Scheduling a suite
2
3
This requires access to a running teuthology cluster, for instance the cluster from the sepia lab.
4
5
* git clone https://github.com/ceph/teuthology/
6
* cd teuthology
7
* ./bootstrap
8 8 Loïc Dachary
* Test the suite with *--dry-run*, i.e. something like:
9 3 Loïc Dachary
<pre>
10 15 Loïc Dachary
./virtualenv/bin/teuthology-suite --dry-run --filter-out btrfs,ext4 --priority 1000 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant
11 1 Loïc Dachary
</pre>
12 6 Loïc Dachary
* Review the jobs to be scheduled and if it matches what is expected, run the same command without the *--dry-run*
13 9 Loïc Dachary
* Assuming the suite was run on the sepia lab, it will immediately show up at http://pulpito.ceph.com/ (for instance http://pulpito.ceph.com/loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi/). Note the *loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi* part of the path: it matches the run name displayed by teuthology-suite
14 3 Loïc Dachary
15 6 Loïc Dachary
The meaning of the teuthology-suite arguments are:
16 4 Loïc Dachary
17
* *--suite*  a reference to https://github.com/ceph/ceph-qa-suite/tree/master/suites. For instance --suite rados means run all jobs at https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
18
* *--suite-branch* a reference to the ceph-qa-suite branch to use. For instance --suite rados --suite-branch giant means run all jobs at https://github.com/ceph/ceph-qa-suite/tree/giant/suites/rados instead of https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
19
* *--priority* is the the priority of the job in the queue (lower numbers are higher priority). By default it is 1000 and if nothing is urgent it should be used. If in the process of debugging a single job that fails, the priority 101 can be used so that it is scheduled faster.
20
* *--machine-type* is the kind of machine that will be provisionned to run the tests. Use *plana,burnupi,mira* for bare metal or *vps* for virtual machines.
21
* *--ceph* is the branch of the Ceph repository to use and defaults to master. For instance --ceph giant means use https://github.com/ceph/ceph/tree/giant/ instead of https://github.com/ceph/ceph/tree/master/
22
* *--email* when the run is complete an email will be sent to this address with a summary of the results.
23
24 16 Loïc Dachary
h3. Scheduling a rados suite
25 13 Loïc Dachary
26 1 Loïc Dachary
In hammer "rados suite":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados generates more than 3000 jobs and it is too heavy for the purpose of validating a series of pull requests before they are merged into the stable release branch. The number of jobs can be reduced by using the *--subset* option. It will make sure every yaml (facet) is included in the run but will not include all the combinations. For instance:
27 16 Loïc Dachary
28 20 Loïc Dachary
<code>./virtualenv/bin/teuthology-suite --subset 1/13 --priority 101 --suite rados --suite-branch hammer --machine-type plana,burnupi,mira --distro ubuntu --email abhishek.lekshmanan@gmail.com --owner loic@dachary.org  --ceph hammer-backports</code>
29 16 Loïc Dachary
30 20 Loïc Dachary
Will reduce the number of jobs to *1/13* of the total number of jobs, as long as it includes all yaml (facets). 
31 17 Loïc Dachary
32 20 Loïc Dachary
For a given "rados suite":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados *--subset 0/13* will always create the same jobs. By running *0/13*, then *1/13*, etc. up to *13/13* all jobs are generated. It is a good idea to not always pick the same subset to get various combinations and increase the odds of discovering a problem.
33 13 Loïc Dachary
34 3 Loïc Dachary
h3. Re-scheduling failed or dead jobs from an existing suite
35
36
* Ask https://github.com/ceph/paddles (the server in which suite runs are stored) about the *dead* jobs
37
<pre>
38
run=loic-2015-03-23_01:09:31-rados-giant---basic-multi
39
eval filter=$(curl --silent http://paddles.front.sepia.ceph.com/runs/$run/jobs/?status=dead | jq '.[].description' | while read description ; do echo -n $description, ; done | sed -e 's/,$//')
40
</pre>
41 5 Loïc Dachary
or the *fail* jobs:
42 3 Loïc Dachary
<pre>
43
run=loic-2015-03-23_01:09:31-rados-giant---basic-multi
44 1 Loïc Dachary
eval filter=$(curl --silent http://paddles.front.sepia.ceph.com/runs/$run/jobs/?status=fail | jq '.[].description' | while read description ; do echo -n $description, ; done | sed -e 's/,$//')
45 3 Loïc Dachary
</pre>
46
* Re-run the suite using the same command line without *--filter-out* and with *--filter "$filter"* to only schedule the jobs described in the *fitler* variable
47
<pre>
48 15 Loïc Dachary
./virtualenv/bin/teuthology-suite --filter "$filter" --priority 1000 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant
49 3 Loïc Dachary
</pre>
50
51 7 Loïc Dachary
h3. Killing a suite
52
53
* if the run is scheduled but did not start yet:
54 12 Abhishek Lekshmanan
** if the suite was scheduled with *--machine-type plana,burnupi,mira*:
55 10 Abhishek Lekshmanan
<pre>
56
./virtualenv/bin/teuthology-kill -m multi -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer 
57
</pre>
58 12 Abhishek Lekshmanan
** if the suite was scheduled with *--machine-type vps*
59 10 Abhishek Lekshmanan
<pre>
60 12 Abhishek Lekshmanan
./virtualenv/bin/teuthology-kill -m vps -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer
61 10 Abhishek Lekshmanan
</pre>
62
63 1 Loïc Dachary
* if the run already started the *-m* option is not necessary
64 10 Abhishek Lekshmanan
<pre>
65 12 Abhishek Lekshmanan
./virtualenv/bin/teuthology-kill -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer
66 10 Abhishek Lekshmanan
</pre>
67 3 Loïc Dachary
68 1 Loïc Dachary
h3. Integration suites
69
70
Expected to be successfully run on the integration branch before "asking approval to the leads":http://ceph.com/docs/master/dev/development-workflow/#resolving-bug-reports-and-implementing-features (hence before asking QE to tests further)
71
72 2 Loïc Dachary
"rados":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
73
"rgw":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rgw
74
"rbd":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rbd
75
"fs":https://github.com/ceph/ceph-qa-suite/tree/master/suites/fs
76 1 Loïc Dachary
77
h3. QE suites
78
79
Expected to be successfully run on the release branch before "passing it to the person publishing the release":http://ceph.com/docs/master/dev/development-workflow/#cutting-a-new-stable-release.
80
81
h4. dumpling
82
83
rados
84
rbd
85
rgw
86
fs
87
ceph-deploy
88
upgrade/dumpling
89
90
h4. firefly
91
92
rados
93
rbd
94
rgw
95
fs
96
krbd
97
kcephfs
98
samba
99
ceph-deploy
100
upgrade/firefly
101
upgrade/dumpling-firefly-x (to giant)
102
powercycle 
103
104
h4. giant
105
106
rados
107
rbd
108
rgw
109
fs
110
krbd
111
kcephfs
112
knfs
113
haddop
114
samba
115
rest
116
multimds
117
multi-version
118
upgrade/giant
119
powecycle