Project

General

Profile

Actions

Bug #64051

open

mgr/prometheus Missing metrics after update to 18.2.1

Added by Blake Klynsma 4 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
MgrMonitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
4 - irritation
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a cluster that we just upgraded to 18.2.1. Since the update the prometheus exporter, as a part of the mgr module, is missing metrics that previously existed.

I have gone down https://docs.ceph.com/en/latest/monitoring/#performance-metrics and found that almost every service metric on that page is now missing from the exporter data. Looking at our data, these metrics existed in 18.2.0, they are missing from 18.2.1. We are using Docker and the images from Quay.IO, specifically image "quay.io/ceph/ceph@sha256:7a7398bfd00202401f3b35731bb091c8a3a706eeac5c0c4cfaba85e6fd18bb3f"

I have attached a metrics.txt file that contains the current metrics, scrubbed of some identifying information. Please let me know if there is any other information I can provide to help.


Files

metrics.txt (92.5 KB) metrics.txt Blake Klynsma, 01/16/2024 04:19 PM
Actions #1

Updated by Avan Thakkar 3 months ago

I believe the issue you're experiencing is similar to the one discussed in this thread: https://tracker.ceph.com/issues/63927. I've posted a comment there outlining potential steps to resolve the issue. Would you mind giving those steps a try and confirming the results? You can find my comment at this link: https://tracker.ceph.com/issues/63927#note-2.

Actions #2

Updated by Blake Klynsma 3 months ago

I have looked into the other thread. It appears our cluster is configured the same way. We are not running the ceph-exporter daemons and we were relying on the perf metrics from the mgr/prometheus. Thank you. I have enabled the ceph-exporter daemons and reconfigured Prometheus targets. We are good now. This issue can be closed.

Actions #3

Updated by Jan Horacek 3 months ago

hit by this too. thank you for both mentioned ways (in the other linked issue) to fix this.

i think, this should be in release notes. not just some note on ceph-exporter, but when automatic upgrade process does not switch this automagicaly, then relase notes could mention some post-upgrade tasks like this (opt1 - deploy ceph-exporter on each node, opt2 - just enable perf metrics the old way by configuring the boolean mentioned)

post-upgrade section is already present in release notes, so it just means to mention that there.

Actions

Also available in: Atom PDF