Ceph stats and monitoring tools¶
Ceph tracks some state internally for its own purposes, but also exposes a wealth of other information for consumption by external tools. There is little consensus or shared knowledge of what a full solution is or should look like, both for health and performance monitoring.
- Kyle Bader (DreamHost)
- Sage Weil (Inktank)
- Josh Durgin
- Dan Mick
- Xiaobing Zhou(xzhou40 (AT) hawk.iit.edu)
There is a collectd plugin and others have experimented with statsd, both in combination with graphite.
There is a nagios plugin available.
What about ganglia?
The collectd plugin has been used successfully by DreamHost, but my efforts to get it upstream have stalled due to a poor choice of json library.
The nagios plugin is available on github, but not as part of the Ceph tree. Should it be upstream? Documented?
Graphite looks good (to me) for warehousing the stats. What is the best way to get them from all the daemons into graphite? Collectd works okay if you have a single graphite server, but the 'proxy' functionality of collectd does not work if the meters are dynamically defined (as they are with ceph--we add them all the time and the plugin autoconfigures itself accordingly).
Discuss possible approaches, available tools, and come to some consensus on what tools and integrations should be fully developed and documented.