-2F - Testing buildrelease & Teuthology

rturk    Anyone want to be in the hangout for this next track?    11:14
sagewk    me!    11:15
joshd    me too    11:15
sagewk    muted..    11:16
saras    what is teuthology    11:18
sjust    saras: our testing framework    11:18
saras    suit    11:18
saras    this is going to sound funny can you use openvas scan as one of the test    11:23
saras    not asking you to do it looking running it in my use case for teuthology    11:24
*** paravoid_ has joined #ceph-summit2    11:26
Karcaw    what is the largest number of osd's tested regularly?    11:26
*** paravoid is now known as Guest4609    11:27
*** paravoid_ is now known as paravoid    11:27
*** Guest4609 has quit IRC    11:27
saras    can please how teuthology works    11:29
dmick    saras: not familiar with the tool, but check out the teuthology repo    11:29
dmick    it does it a lot of execution of external tests    11:30
dmick    as well as its own    11:30
*** michael_dreamhost has joined #ceph-summit2    11:34
saras    what is min sizes of teuthology cluster    11:36
dmick    can run a cluster on one node if you wish, and either very soon or now, one VM    11:36
saras    dmick: good add try get it up running on my todo list    11:37
dmick    the first hurdle is usually locking.  you don't have to implement locking, but it's on by default in our setup because we have lots of machines and users    11:39
saras    saras: I will more then happey to tell you where i find pain with setup teuthology    11:40
*** mikedawson has joined #ceph-summit2    11:41
saras    that soundes great    11:41
mikedawson    what is considered a large size Teuthology ceph cluster? What is a long-running test?    11:41
dmick    most of our experience is 2-5 machines or so in a cluster, and maybe an hour or two of test run (the longer ones tend to be "steady load with failure injection")    11:42
dmick    but Sage has done some recent much-larger setups    11:42
dmick    (burnupi: our cluster has groups of identical machines named after cuttlefish species, so overall it's "sepia", and the individual machine groups are plana, burnupi, senta, mira, vercoi, etc.)    11:43
joshd    a while back we had some many-node tests, but at the time we had too few machines to run many of them, so we scaled them down for more general coverage    11:43
sjust    perhaps to create a "performance" suite with the messenger delays/failure injection, osd thrashing etc. turned off    11:46
saras    network load    11:46
sjust    and then scrape the nightly runs for the summary yaml?    11:46
dmick    sjust: +1    11:46
sjust    the summary yaml probably needs a way to associate the test with prior runs of the same test    11:46
sjust    sha1 of the config.yaml?    11:47
saras    network load for snyc agents    11:47
mikedawson    Sensu is pretty good for dynamic infrastructure    11:47
sjust    we should expose such info via admin socket    11:49
elder    Or a perf socket?    11:49
dmick    specifically: ceph --admin-daemon <socket> perf dump    11:50
sjust    I had a patch set at one point which allows streaming output from admin socket for osd op events    11:50
sjust    yep    11:50
dmick    sjust: I was hearing "snap at end of test run for a pile of reducable data", but graphing over the time fo the run could be interesting too    11:52
kbader    for network fault injection I've used the 'tc' utilities before    11:52
sjust    dmick: the advantage to grabbing the stream of events is you can get op latency histograms    11:52
kbader    you can inject arbitrary packetloss and latency for an interface    11:52
dmick    sjust: yep, also interesting data    11:53
saras    soundes like alot of what salt does    11:56