Bug #52714: mgr/prometheus: Update ceph_healthcheck_* metric value to 1 when triggered - mgr - Ceph

Actions

Copy link

Bug #52714

open

mgr/prometheus: Update ceph_healthcheck_* metric value to 1 when triggered

Added by Jinmyeong Lee over 2 years ago. Updated over 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We want to add more health check condition in here to monitor the cluster easily. (https://github.com/ceph/ceph/blob/master/src/pybind/mgr/prometheus/module.py#L115)

HEALTH_CHECKS = [
    alert_metric('SLOW_OPS', 'OSD or Monitor requests taking a long time to process'),
]

And the default value is 0 (https://github.com/ceph/ceph/blob/master/src/pybind/mgr/prometheus/module.py#L558)
When some health warning is triggered, then it should be marked to 1( or another value not the default value 0), but it isn't because of here(https://github.com/ceph/ceph/blob/master/src/pybind/mgr/prometheus/module.py#L540)

I fixed this and tested our private cluster.
This is an example of ceph_exporter.

ceph set osd noscrub

# HELP ceph_healthcheck_osdmap_flags OSD Flags (just for testing metric)
# TYPE ceph_healthcheck_osdmap_flags gauge
ceph_healthcheck_osdmap_flags 1.0

After ceph unset osd noscrub

# HELP ceph_healthcheck_osdmap_flags OSD Flags (just for testing metric)
# TYPE ceph_healthcheck_osdmap_flags gauge
ceph_healthcheck_osdmap_flags 0.0

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Custom queries

Bug #52714

mgr/prometheus: Update ceph_healthcheck_* metric value to 1 when triggered

Updated by Paul Cuzner over 2 years ago

Updated by Sebastian Wagner over 2 years ago