Project

General

Profile

Feature #52638

mgr/prometheus: Add all healthchecks to prometheus output and provide a way of viewing history

Added by Paul Cuzner almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
prometheus module
Target version:
% Done:

0%

Source:
Tags:
Backport:
pacific
Reviewed:
Affected Versions:
Pull request ID:

Description

The mgr/prometheus module does not provide a granular view of healthchecks to Prometheus, which means some alerts rely on the generic ceph_health_status > 0 expression. This is not very informative and common source of frustration.

This feature provides a metric per encountered healthcheck, so alert rules can be customised to specific events. In addition since the healthchecks need to be emitted on each scrape, the module needs to persist healthcheck state, which opens the door to providing a healthcheck history. The history would be exposed by a new command, allowing the admin the ability to see what healthchecks have been encountered within the cluster, their frequency and the first and last seen timestamps.

The feature deliverables should include
- additional metrics
- updated prometheus rules
- updated docs for the prometheus module


Related issues

Copied to mgr - Backport #53616: pacific: mgr/prometheus: Add all healthchecks to prometheus output and provide a way of viewing history Resolved

History

#1 Updated by Ernesto Puerta over 1 year ago

  • Status changed from New to Pending Backport
  • Pull request ID set to 43293

#2 Updated by Backport Bot over 1 year ago

  • Copied to Backport #53616: pacific: mgr/prometheus: Add all healthchecks to prometheus output and provide a way of viewing history added

#3 Updated by Ernesto Puerta over 1 year ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF