Project

General

Profile

HOWTO run integration and upgrade tests » History » Version 30

Nathan Cutler, 08/10/2015 07:15 PM

1 3 Loïc Dachary
h3. Scheduling a suite
2
3 30 Nathan Cutler
This requires access to a running teuthology cluster, for instance the cluster from the sepia lab (with Sepia VPN running, you can do something like <code>ssh teuthology.front.sepia.ceph.com</code> ).
4 3 Loïc Dachary
5
* git clone https://github.com/ceph/teuthology/
6
* cd teuthology
7
* ./bootstrap
8 8 Loïc Dachary
* Test the suite with *--dry-run*, i.e. something like:
9 3 Loïc Dachary
<pre>
10 23 Loïc Dachary
./virtualenv/bin/teuthology-suite --dry-run --priority 1000 --subset 1/18 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant
11 1 Loïc Dachary
</pre>
12 6 Loïc Dachary
* Review the jobs to be scheduled and if it matches what is expected, run the same command without the *--dry-run*
13 9 Loïc Dachary
* Assuming the suite was run on the sepia lab, it will immediately show up at http://pulpito.ceph.com/ (for instance http://pulpito.ceph.com/loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi/). Note the *loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi* part of the path: it matches the run name displayed by teuthology-suite
14 3 Loïc Dachary
15 6 Loïc Dachary
The meaning of the teuthology-suite arguments are:
16 4 Loïc Dachary
17
* *--suite*  a reference to https://github.com/ceph/ceph-qa-suite/tree/master/suites. For instance --suite rados means run all jobs at https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
18
* *--suite-branch* a reference to the ceph-qa-suite branch to use. For instance --suite rados --suite-branch giant means run all jobs at https://github.com/ceph/ceph-qa-suite/tree/giant/suites/rados instead of https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
19
* *--priority* is the the priority of the job in the queue (lower numbers are higher priority). By default it is 1000 and if nothing is urgent it should be used. If in the process of debugging a single job that fails, the priority 101 can be used so that it is scheduled faster.
20
* *--machine-type* is the kind of machine that will be provisionned to run the tests. Use *plana,burnupi,mira* for bare metal or *vps* for virtual machines.
21
* *--ceph* is the branch of the Ceph repository to use and defaults to master. For instance --ceph giant means use https://github.com/ceph/ceph/tree/giant/ instead of https://github.com/ceph/ceph/tree/master/
22 1 Loïc Dachary
* *--email* when the run is complete an email will be sent to this address with a summary of the results.
23 25 Loïc Dachary
* *--subset X/N* will reduce the number of jobs to *1/N* of the total number of jobs, as long as it includes all yaml (facets). For instance, in a given "rados suite":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados *--subset 0/18* will always create the same jobs. By running *0/18*, then *1/18*, etc. up to *17/18* all jobs are generated. It is a good idea to not always pick the same subset to get various combinations and increase the odds of discovering a problem. As of may 2015, it is recommended to use X/18 for rados which is by far the largest suite (3000+ jobs). Since all other suites are smaller, X/18 is also a good choice for all of them.
24 16 Loïc Dachary
25 23 Loïc Dachary
h3. Using the *--subset* option
26 17 Loïc Dachary
27 23 Loïc Dachary
The goal of integration tests is to verify a set of commits won't create problems that could be easily detected by running the teuthology suites. If the *--subset* is not used, the tests being run will be the same as if *--subset* is used, only combined differently. Running all combinations of tests is useful to detect subtle bugs, but it's not the focus of the integration tests. It is therefore advisable to always use the *--subset* option, regardless of the suite being run, to reduce the number of tests being scheduled and speed up their completion.
28 13 Loïc Dachary
29 3 Loïc Dachary
h3. Re-scheduling failed or dead jobs from an existing suite
30
31 21 Loïc Dachary
* Ask https://github.com/ceph/paddles (the server in which suite runs are stored) about the *dead* or *fail* jobs
32 3 Loïc Dachary
<pre>
33
run=loic-2015-03-23_01:09:31-rados-giant---basic-multi
34 21 Loïc Dachary
eval filter=$(curl --silent http://paddles.front.sepia.ceph.com/runs/$run/ | jq '.jobs[] | select(.status == "dead" or .status == "fail") | .description' | while read description ; do echo -n $description, ; done | sed -e 's/,$//')
35 3 Loïc Dachary
</pre>
36 21 Loïc Dachary
* Re-run the suite using the same command line without *--filter-out* and with *--filter="$filter"* to only schedule the jobs described in the *fitler* variable
37 3 Loïc Dachary
<pre>
38 21 Loïc Dachary
./virtualenv/bin/teuthology-suite --filter="$filter" --priority 1000 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant
39 3 Loïc Dachary
</pre>
40
41 27 Abhishek Lekshmanan
h3. Scheduling jobs missed in an existing suite.
42 28 Abhishek Lekshmanan
43
44
In case some network problem occurs when you're scheduling a large suite, instead of killing and rescheduling the entire suite, only the missed jobs can be scheduled as
45 27 Abhishek Lekshmanan
<pre>
46
run=http://pulpito.ceph.com/abhi-2015-06-20_09:01:42-rados-hammer-backports---basic-multi/
47
eval filter=$(curl --silent http://paddles.front.sepia.ceph.com/runs/$run/ | jq '.jobs[] | .description' | while read description ; do echo -n $description, ; done | sed -e 's/,$//')
48
./virtualenv/bin/teuthology-suite --filter-out="$filter" --priority 1000 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant
49
</pre>
50
51 7 Loïc Dachary
h3. Killing a suite
52
53
* if the run is scheduled but did not start yet:
54 12 Abhishek Lekshmanan
** if the suite was scheduled with *--machine-type plana,burnupi,mira*:
55 10 Abhishek Lekshmanan
<pre>
56
./virtualenv/bin/teuthology-kill -m multi -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer 
57
</pre>
58 12 Abhishek Lekshmanan
** if the suite was scheduled with *--machine-type vps*
59 10 Abhishek Lekshmanan
<pre>
60 12 Abhishek Lekshmanan
./virtualenv/bin/teuthology-kill -m vps -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer
61 10 Abhishek Lekshmanan
</pre>
62
63 1 Loïc Dachary
* if the run already started the *-m* option is not necessary
64 10 Abhishek Lekshmanan
<pre>
65 12 Abhishek Lekshmanan
./virtualenv/bin/teuthology-kill -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer
66 10 Abhishek Lekshmanan
</pre>
67 3 Loïc Dachary
68 1 Loïc Dachary
h3. Integration suites
69
70
Expected to be successfully run on the integration branch before "asking approval to the leads":http://ceph.com/docs/master/dev/development-workflow/#resolving-bug-reports-and-implementing-features (hence before asking QE to tests further)
71
72 2 Loïc Dachary
"rados":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
73
"rgw":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rgw
74
"rbd":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rbd
75
"fs":https://github.com/ceph/ceph-qa-suite/tree/master/suites/fs
76 22 Loïc Dachary
"upgrade":https://github.com/ceph/ceph-qa-suite/tree/master/suites/upgrade/hammer-x (with vps to cover all supported operating systems)
77 26 Loïc Dachary
"powercycle":https://github.com/ceph/ceph-qa-suite/tree/master/suites/powercycle (only run 2 for the sake of verifying the code path for obvious mistakes)
78 1 Loïc Dachary
79
h3. QE suites
80
81
Expected to be successfully run on the release branch before "passing it to the person publishing the release":http://ceph.com/docs/master/dev/development-workflow/#cutting-a-new-stable-release.
82
83
h4. dumpling
84
85
rados
86
rbd
87
rgw
88
fs
89
ceph-deploy
90
upgrade/dumpling
91
92
h4. firefly
93
94
rados
95
rbd
96
rgw
97
fs
98
krbd
99
kcephfs
100
samba
101
ceph-deploy
102
upgrade/firefly
103
upgrade/dumpling-firefly-x (to giant)
104
powercycle 
105
106
h4. giant
107
108
rados
109
rbd
110
rgw
111
fs
112
krbd
113
kcephfs
114
knfs
115
haddop
116
samba
117
rest
118
multimds
119
multi-version
120
upgrade/giant
121
powecycle