Bug #55029: mgr/prometheus: ceph_mon_metadata is not consistently populating the ceph_version - mgr - Ceph

Actions

Copy link

Bug #55029

open

mgr/prometheus: ceph_mon_metadata is not consistently populating the ceph_version

Added by Paul Cuzner about 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

Ceph - v18.0.0

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v14.2.11

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Some users have been using the ceph_mon_metadata to determine whether there is a version mismatch within the cluster that needs to be resolved. This has been done with the mgr/prometheus data.

The issue they hit is that the ceph_version field is sometimes not populated, resulting in the alert firing erroneously.

ceph_mon_metadata{ceph_daemon="mon.a", ceph_version="ceph version 14.2.11-199.el8cp (f5470cbfb5a4dac5925284cef1215f3e4e191a38) nautilus (stable)", container="mgr", endpoint="http-metrics", hostname="ip-10-163-144-114.eu-west-1.compute.internal", instance="10.129.4.23:9283", job="rook-ceph-mgr", namespace="openshift-storage", pod="rook-ceph-mgr-a-6cbdc85c66-x97xj", public_addr="172.30.78.135", rank="0", service="rook-ceph-mgr"}

ceph_mon_metadata{ceph_daemon="mon.b", ceph_version="ceph version 14.2.11-199.el8cp (f5470cbfb5a4dac5925284cef1215f3e4e191a38) nautilus (stable)", container="mgr", endpoint="http-metrics", hostname="ip-10-163-144-182.eu-west-1.compute.internal", instance="10.129.4.23:9283", job="rook-ceph-mgr", namespace="openshift-storage", pod="rook-ceph-mgr-a-6cbdc85c66-x97xj", public_addr="172.30.71.229", rank="1", service="rook-ceph-mgr"}

ceph_mon_metadata{ceph_daemon="mon.c", container="mgr", endpoint="http-metrics", instance="10.129.4.23:9283", job="rook-ceph-mgr", namespace="openshift-storage", pod="rook-ceph-mgr-a-6cbdc85c66-x97xj", public_addr="172.30.100.211", rank="2", service="rook-ceph-mgr"}

This is the alert definition
count(count by(ceph_version) (ceph_mon_metadata{job="rook-ceph-mgr"})) > 1

A workaround for the problem changes the query to
count(count(ceph_mon_metadata{job="rook-ceph-mgr", ceph_version!=""}) by (ceph_version)) > 1

No data to display

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Custom queries

Bug #55029

mgr/prometheus: ceph_mon_metadata is not consistently populating the ceph_version