Project

General

Profile

-2F - Testing buildrelease & » History » Version 1

Jessica Mack, 06/22/2015 05:49 AM

1 1 Jessica Mack
h1. -2F - Testing buildrelease & Teuthology
2
3
<pre>
4
5
rturk	Anyone want to be in the hangout for this next track?	11:14
6
sagewk	me!	11:15
7
joshd	me too	11:15
8
sagewk	muted..	11:16
9
saras	what is teuthology	11:18
10
sjust	saras: our testing framework	11:18
11
saras	suit	11:18
12
saras	this is going to sound funny can you use openvas scan as one of the test	11:23
13
saras	not asking you to do it looking running it in my use case for teuthology	11:24
14
*** paravoid_ has joined #ceph-summit2	11:26
15
Karcaw	what is the largest number of osd's tested regularly?	11:26
16
*** paravoid is now known as Guest4609	11:27
17
*** paravoid_ is now known as paravoid	11:27
18
*** Guest4609 has quit IRC	11:27
19
saras	can please how teuthology works	11:29
20
dmick	saras: not familiar with the tool, but check out the teuthology repo	11:29
21
dmick	it does it a lot of execution of external tests	11:30
22
dmick	as well as its own	11:30
23
*** michael_dreamhost has joined #ceph-summit2	11:34
24
saras	what is min sizes of teuthology cluster	11:36
25
dmick	can run a cluster on one node if you wish, and either very soon or now, one VM	11:36
26
saras	dmick: good add try get it up running on my todo list	11:37
27
dmick	the first hurdle is usually locking.  you don't have to implement locking, but it's on by default in our setup because we have lots of machines and users	11:39
28
saras	saras: I will more then happey to tell you where i find pain with setup teuthology	11:40
29
*** mikedawson has joined #ceph-summit2	11:41
30
saras	that soundes great	11:41
31
mikedawson	what is considered a large size Teuthology ceph cluster? What is a long-running test?	11:41
32
dmick	most of our experience is 2-5 machines or so in a cluster, and maybe an hour or two of test run (the longer ones tend to be "steady load with failure injection")	11:42
33
dmick	but Sage has done some recent much-larger setups	11:42
34
dmick	(burnupi: our cluster has groups of identical machines named after cuttlefish species, so overall it's "sepia", and the individual machine groups are plana, burnupi, senta, mira, vercoi, etc.)	11:43
35
joshd	a while back we had some many-node tests, but at the time we had too few machines to run many of them, so we scaled them down for more general coverage	11:43
36
sjust	perhaps to create a "performance" suite with the messenger delays/failure injection, osd thrashing etc. turned off	11:46
37
saras	network load	11:46
38
sjust	and then scrape the nightly runs for the summary yaml?	11:46
39
dmick	sjust: +1	11:46
40
sjust	the summary yaml probably needs a way to associate the test with prior runs of the same test	11:46
41
sjust	sha1 of the config.yaml?	11:47
42
saras	network load for snyc agents	11:47
43
mikedawson	Sensu is pretty good for dynamic infrastructure	11:47
44
sjust	we should expose such info via admin socket	11:49
45
elder	Or a perf socket?	11:49
46
dmick	specifically: ceph --admin-daemon <socket> perf dump	11:50
47
sjust	I had a patch set at one point which allows streaming output from admin socket for osd op events	11:50
48
sjust	yep	11:50
49
dmick	sjust: I was hearing "snap at end of test run for a pile of reducable data", but graphing over the time fo the run could be interesting too	11:52
50
kbader	for network fault injection I've used the 'tc' utilities before	11:52
51
sjust	dmick: the advantage to grabbing the stream of events is you can get op latency histograms	11:52
52
kbader	you can inject arbitrary packetloss and latency for an interface	11:52
53
dmick	sjust: yep, also interesting data	11:53
54
saras	soundes like alot of what salt does	11:56
55
</pre>