Project

General

Profile

HOWTO run integration and upgrade tests » History » Version 36

Nathan Cutler, 08/13/2016 06:40 PM

1 3 Loïc Dachary
h3. Scheduling a suite
2 3 Loïc Dachary
3 30 Nathan Cutler
This requires access to a running teuthology cluster, for instance the cluster from the sepia lab (with Sepia VPN running, you can do something like <code>ssh teuthology.front.sepia.ceph.com</code> ).
4 3 Loïc Dachary
5 3 Loïc Dachary
* git clone https://github.com/ceph/teuthology/
6 3 Loïc Dachary
* cd teuthology
7 3 Loïc Dachary
* ./bootstrap
8 8 Loïc Dachary
* Test the suite with *--dry-run*, i.e. something like:
9 3 Loïc Dachary
<pre>
10 23 Loïc Dachary
./virtualenv/bin/teuthology-suite --dry-run --priority 1000 --subset 1/18 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant
11 1 Loïc Dachary
</pre>
12 6 Loïc Dachary
* Review the jobs to be scheduled and if it matches what is expected, run the same command without the *--dry-run*
13 9 Loïc Dachary
* Assuming the suite was run on the sepia lab, it will immediately show up at http://pulpito.ceph.com/ (for instance http://pulpito.ceph.com/loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi/). Note the *loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer---basic-multi* part of the path: it matches the run name displayed by teuthology-suite
14 3 Loïc Dachary
15 6 Loïc Dachary
The meaning of the teuthology-suite arguments are:
16 4 Loïc Dachary
17 4 Loïc Dachary
* *--suite*  a reference to https://github.com/ceph/ceph-qa-suite/tree/master/suites. For instance --suite rados means run all jobs at https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
18 4 Loïc Dachary
* *--suite-branch* a reference to the ceph-qa-suite branch to use. For instance --suite rados --suite-branch giant means run all jobs at https://github.com/ceph/ceph-qa-suite/tree/giant/suites/rados instead of https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
19 4 Loïc Dachary
* *--priority* is the the priority of the job in the queue (lower numbers are higher priority). By default it is 1000 and if nothing is urgent it should be used. If in the process of debugging a single job that fails, the priority 101 can be used so that it is scheduled faster.
20 4 Loïc Dachary
* *--machine-type* is the kind of machine that will be provisionned to run the tests. Use *plana,burnupi,mira* for bare metal or *vps* for virtual machines.
21 4 Loïc Dachary
* *--ceph* is the branch of the Ceph repository to use and defaults to master. For instance --ceph giant means use https://github.com/ceph/ceph/tree/giant/ instead of https://github.com/ceph/ceph/tree/master/
22 35 Nathan Cutler
* *--email* when the run is complete an email will be sent to this address with a summary of the results
23 36 Nathan Cutler
* *--kernel* controls which kernel is installed on the testing machines - can be either "distro" or "testing" - always use <code>--kernel distro</code> to ensure all jobs have the same kernel installed
24 25 Loïc Dachary
* *--subset X/N* will reduce the number of jobs to *1/N* of the total number of jobs, as long as it includes all yaml (facets). For instance, in a given "rados suite":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados *--subset 0/18* will always create the same jobs. By running *0/18*, then *1/18*, etc. up to *17/18* all jobs are generated. It is a good idea to not always pick the same subset to get various combinations and increase the odds of discovering a problem. As of may 2015, it is recommended to use X/18 for rados which is by far the largest suite (3000+ jobs). Since all other suites are smaller, X/18 is also a good choice for all of them.
25 16 Loïc Dachary
26 23 Loïc Dachary
h3. Using the *--subset* option
27 17 Loïc Dachary
28 23 Loïc Dachary
The goal of integration tests is to verify a set of commits won't create problems that could be easily detected by running the teuthology suites. If the *--subset* is not used, the tests being run will be the same as if *--subset* is used, only combined differently. Running all combinations of tests is useful to detect subtle bugs, but it's not the focus of the integration tests. It is therefore advisable to always use the *--subset* option, regardless of the suite being run, to reduce the number of tests being scheduled and speed up their completion.
29 13 Loïc Dachary
30 34 Nathan Cutler
Regarding rados suite integration testing runs, in August 2016 it was agreed with Samuel Just (rados lead) that we will use such a <code>--subset</code> value that yields approximately 250 jobs:
31 34 Nathan Cutler
32 34 Nathan Cutler
* for hammer a reasonable (estimated) value is <code>--subset $(expr $RANDOM % 20)/20</code>
33 34 Nathan Cutler
* for jewel a reasonable (estimated) value is <code>--subset $(expr $RANDOM % 65)/65</code>
34 33 Nathan Cutler
35 3 Loïc Dachary
h3. Re-scheduling failed or dead jobs from an existing suite
36 3 Loïc Dachary
37 21 Loïc Dachary
* Ask https://github.com/ceph/paddles (the server in which suite runs are stored) about the *dead* or *fail* jobs
38 3 Loïc Dachary
<pre>
39 3 Loïc Dachary
run=loic-2015-03-23_01:09:31-rados-giant---basic-multi
40 21 Loïc Dachary
eval filter=$(curl --silent http://paddles.front.sepia.ceph.com/runs/$run/ | jq '.jobs[] | select(.status == "dead" or .status == "fail") | .description' | while read description ; do echo -n $description, ; done | sed -e 's/,$//')
41 3 Loïc Dachary
</pre>
42 21 Loïc Dachary
* Re-run the suite using the same command line without *--filter-out* and with *--filter="$filter"* to only schedule the jobs described in the *fitler* variable
43 3 Loïc Dachary
<pre>
44 21 Loïc Dachary
./virtualenv/bin/teuthology-suite --filter="$filter" --priority 1000 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant
45 3 Loïc Dachary
</pre>
46 3 Loïc Dachary
47 27 Abhishek Lekshmanan
h3. Scheduling jobs missed in an existing suite.
48 28 Abhishek Lekshmanan
49 28 Abhishek Lekshmanan
50 28 Abhishek Lekshmanan
In case some network problem occurs when you're scheduling a large suite, instead of killing and rescheduling the entire suite, only the missed jobs can be scheduled as
51 27 Abhishek Lekshmanan
<pre>
52 27 Abhishek Lekshmanan
run=http://pulpito.ceph.com/abhi-2015-06-20_09:01:42-rados-hammer-backports---basic-multi/
53 27 Abhishek Lekshmanan
eval filter=$(curl --silent http://paddles.front.sepia.ceph.com/runs/$run/ | jq '.jobs[] | .description' | while read description ; do echo -n $description, ; done | sed -e 's/,$//')
54 27 Abhishek Lekshmanan
./virtualenv/bin/teuthology-suite --filter-out="$filter" --priority 1000 --suite rados --suite-branch giant --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --ceph giant
55 27 Abhishek Lekshmanan
</pre>
56 27 Abhishek Lekshmanan
57 7 Loïc Dachary
h3. Killing a suite
58 7 Loïc Dachary
59 7 Loïc Dachary
* if the run is scheduled but did not start yet:
60 12 Abhishek Lekshmanan
** if the suite was scheduled with *--machine-type plana,burnupi,mira*:
61 10 Abhishek Lekshmanan
<pre>
62 10 Abhishek Lekshmanan
./virtualenv/bin/teuthology-kill -m multi -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer 
63 10 Abhishek Lekshmanan
</pre>
64 12 Abhishek Lekshmanan
** if the suite was scheduled with *--machine-type vps*
65 10 Abhishek Lekshmanan
<pre>
66 12 Abhishek Lekshmanan
./virtualenv/bin/teuthology-kill -m vps -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer
67 10 Abhishek Lekshmanan
</pre>
68 10 Abhishek Lekshmanan
69 1 Loïc Dachary
* if the run already started the *-m* option is not necessary
70 10 Abhishek Lekshmanan
<pre>
71 12 Abhishek Lekshmanan
./virtualenv/bin/teuthology-kill -r loic-2015-03-27_09:57:09-upgrade:firefly-x:stress-split-erasure-code-hammer
72 10 Abhishek Lekshmanan
</pre>
73 3 Loïc Dachary
74 32 Nathan Cutler
*NOTE:* allow some time for the operation to complete.
75 32 Nathan Cutler
76 1 Loïc Dachary
h3. Integration suites
77 1 Loïc Dachary
78 1 Loïc Dachary
Expected to be successfully run on the integration branch before "asking approval to the leads":http://ceph.com/docs/master/dev/development-workflow/#resolving-bug-reports-and-implementing-features (hence before asking QE to tests further)
79 1 Loïc Dachary
80 2 Loïc Dachary
"rados":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados
81 2 Loïc Dachary
"rgw":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rgw
82 2 Loïc Dachary
"rbd":https://github.com/ceph/ceph-qa-suite/tree/master/suites/rbd
83 2 Loïc Dachary
"fs":https://github.com/ceph/ceph-qa-suite/tree/master/suites/fs
84 22 Loïc Dachary
"upgrade":https://github.com/ceph/ceph-qa-suite/tree/master/suites/upgrade/hammer-x (with vps to cover all supported operating systems)
85 26 Loïc Dachary
"powercycle":https://github.com/ceph/ceph-qa-suite/tree/master/suites/powercycle (only run 2 for the sake of verifying the code path for obvious mistakes)
86 31 Loïc Dachary
"ceph-deploy":https://github.com/ceph/ceph-qa-suite/tree/master/suites/ceph-deploy (to verify packaging because it uses packages to install ceph)
87 1 Loïc Dachary
88 1 Loïc Dachary
h3. QE suites
89 1 Loïc Dachary
90 1 Loïc Dachary
Expected to be successfully run on the release branch before "passing it to the person publishing the release":http://ceph.com/docs/master/dev/development-workflow/#cutting-a-new-stable-release.
91 1 Loïc Dachary
92 1 Loïc Dachary
h4. dumpling
93 1 Loïc Dachary
94 1 Loïc Dachary
rados
95 1 Loïc Dachary
rbd
96 1 Loïc Dachary
rgw
97 1 Loïc Dachary
fs
98 1 Loïc Dachary
ceph-deploy
99 1 Loïc Dachary
upgrade/dumpling
100 1 Loïc Dachary
101 1 Loïc Dachary
h4. firefly
102 1 Loïc Dachary
103 1 Loïc Dachary
rados
104 1 Loïc Dachary
rbd
105 1 Loïc Dachary
rgw
106 1 Loïc Dachary
fs
107 1 Loïc Dachary
krbd
108 1 Loïc Dachary
kcephfs
109 1 Loïc Dachary
samba
110 1 Loïc Dachary
ceph-deploy
111 1 Loïc Dachary
upgrade/firefly
112 1 Loïc Dachary
upgrade/dumpling-firefly-x (to giant)
113 1 Loïc Dachary
powercycle 
114 1 Loïc Dachary
115 1 Loïc Dachary
h4. giant
116 1 Loïc Dachary
117 1 Loïc Dachary
rados
118 1 Loïc Dachary
rbd
119 1 Loïc Dachary
rgw
120 1 Loïc Dachary
fs
121 1 Loïc Dachary
krbd
122 1 Loïc Dachary
kcephfs
123 1 Loïc Dachary
knfs
124 1 Loïc Dachary
haddop
125 1 Loïc Dachary
samba
126 1 Loïc Dachary
rest
127 1 Loïc Dachary
multimds
128 1 Loïc Dachary
multi-version
129 1 Loïc Dachary
upgrade/giant
130 1 Loïc Dachary
powecycle