Project

General

Profile

2F - Testing buildrelease & Teuthology

Live Pad

The live pad can be found here: [pad]

Summit Snapshot

Building and testing Ceph automatically

Build infrastructure (gitbuilders):

https://github.com/ceph/gitbuilder + https://github.com/ceph/autobuild-ceph on-demand
Ceph gitbuilder status: http://ceph.com/gitbuilder.cgi

Release builds
1. target platforms?
ubuntu (precise, quantal, soon raring), squeeze, Centos, SuSE
2. process

Teuthology http://github.com/ceph/teuthology

automated test cluster setup/run/collect output
uses physical or virtual machines
cooperative locking of machine resources to avoid trampling other users (optional)
write YAML configuration files to select machines/roles, Ceph versions to install, tests to run
a few hundred physical machines internal to Inktank to run nightly tests

gtest for unit tests: (make check)

require tests for new code
refactor existing code to be testable

Test suite http://github.com/ceph/ceph-qa-suite
  • combination/permutation of teuthology tests, cluster configurations
    many different functional/regression tests, with and without failure injection

upgrade testing

run tests in mixed verison environment with slow rolling upgrades

integration tests:

openstack
cloudstack
chef.py
libvirt (pools and volumes)
qemu

Allocate/create VMs using EC2 or Openstack APIs?

Improve documentation on how to get it set up and working
- how to setup test machines (http://github.com/ceph/ceph-qa-chef )

Work items:

build out a large cluster test suite
parallel.py and sequential.py task
rados.py radosmodel test should infer the list of clients and run them in parallel
task to slurp up/archive perf counters
identifying key metrics to monitor

osd: small/large write performance
mds: metadata ops/sec
...

qemu gitbuilder
build large long-term clusters on burnupi?
samba (and others?) don't register as running daemons and thus can't be restarted by the upgrade task

Performance:

need to be able to identify performance regressions
memory, cpu usage, network usage(please) data

perf task?
collectl?
store aggregated data in summary.yaml for each runl

time (have raw timer task; need to log results)
identify data warehouse and make something to import into it
build chart.io graphs :)
aggregate/slurp the osd perf counters at end of run?

scribe (facebook)
flume (cloudera)