Project

General

Profile

Actions

CDM 02-FEB-2022 » History » Revision 2

« Previous | Revision 2/5 (diff) | Next »
Paul Cuzner, 01/25/2022 10:10 PM


CDM 02-FEB-2022

  • How should we collect metrics from key daemons?
    Currently all daemons report performance and state to the mgr, and the mgr/prometheus module exposes this data to monitoring and alerting daemons. However, as our demand for more operational data grows this approach places further demand on the mgr. To address this there have been two strategies proposed.
    1. Place an exporter daemon on every node: This daemon then becomes the contact point for monitoring stacks, and would be responsible for gathering state/performance/capacity info from each daemon on that node. This approach isolates the 3rd party prometheus integration library from core ceph code, eliminating the potential of this layer impacting the stability or security of the main ceph daemons. The main downside is that the daemon would need need to actively discover and collect data from all other daemons on the host (which in a containerised context could prove challenging/problematic especially in a kubernetes (rook) environment). In addition, this strategy will require code changes across all daemons to expose the data to be collected.
    2. Embed the exporter http(s) endpoint into the relevant ceph daemons: This approach would extend the current rbd-mirror, cephfs-mirror, radosgw daemons to include a http endpoint based on beast and the prometheus-cpp library. Since this approach embeds the endpoint with the main daemon, extracting and exposing the data is straightforward, and represents minimal impact to bare-metal or rook/kubernetes deployments. The downside is that the introduction of the 3rd-party code could affect stability or security of the ceph daemon, and if the strategy encompasses OSDs, prometheus will have potentially 000's of additional endpoints.

Aside from the implementation differences, the other factor to consider is the samples size returned to Prometheus server. For example, mgr/prometheus currently returns perf counters for the whole cluster. This is problematic. On a large cluster of 3,776 OSDs, the mgr/prometheus module attempts to return 850,000 samples (50+MB) to prometheus every 15s! This results in scrape failures and stale data, which impacts monitoring and alerting. Even without the perf counters, it returns 30,000 samples (3MB)

Updated by Paul Cuzner over 2 years ago · 2 revisions