Project

General

Profile

Bug #55029

mgr/prometheus: ceph_mon_metadata is not consistently populating the ceph_version

Added by Paul Cuzner 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Some users have been using the ceph_mon_metadata to determine whether there is a version mismatch within the cluster that needs to be resolved. This has been done with the mgr/prometheus data.

The issue they hit is that the ceph_version field is sometimes not populated, resulting in the alert firing erroneously.

ceph_mon_metadata{ceph_daemon="mon.a", ceph_version="ceph version 14.2.11-199.el8cp (f5470cbfb5a4dac5925284cef1215f3e4e191a38) nautilus (stable)", container="mgr", endpoint="http-metrics", hostname="ip-10-163-144-114.eu-west-1.compute.internal", instance="10.129.4.23:9283", job="rook-ceph-mgr", namespace="openshift-storage", pod="rook-ceph-mgr-a-6cbdc85c66-x97xj", public_addr="172.30.78.135", rank="0", service="rook-ceph-mgr"}

ceph_mon_metadata{ceph_daemon="mon.b", ceph_version="ceph version 14.2.11-199.el8cp (f5470cbfb5a4dac5925284cef1215f3e4e191a38) nautilus (stable)", container="mgr", endpoint="http-metrics", hostname="ip-10-163-144-182.eu-west-1.compute.internal", instance="10.129.4.23:9283", job="rook-ceph-mgr", namespace="openshift-storage", pod="rook-ceph-mgr-a-6cbdc85c66-x97xj", public_addr="172.30.71.229", rank="1", service="rook-ceph-mgr"}

ceph_mon_metadata{ceph_daemon="mon.c", container="mgr", endpoint="http-metrics", instance="10.129.4.23:9283", job="rook-ceph-mgr", namespace="openshift-storage", pod="rook-ceph-mgr-a-6cbdc85c66-x97xj", public_addr="172.30.100.211", rank="2", service="rook-ceph-mgr"}

This is the alert definition
count(count by(ceph_version) (ceph_mon_metadata{job="rook-ceph-mgr"})) > 1

A workaround for the problem changes the query to
count(count(ceph_mon_metadata{job="rook-ceph-mgr", ceph_version!=""}) by (ceph_version)) > 1

Also available in: Atom PDF