Project

General

Profile

Actions

Bug #64288

open

Inconsistent uses of instance and hostname label in Prometheus metrics

Added by Christian Rohmann 2 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
prometheus module
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While the configuration of (an external) Prometheus server, in regards to the instance label and also the option honor_labels is explained at https://docs.ceph.com/en/latest/mgr/prometheus/#configuring-prometheus-server, I still believe things are not fully consistent.

1) I ran into some side-effects when using the ceph-mixings (https://github.com/ceph/ceph/tree/main/monitoring/ceph-mixin), which do join quite a lot on the instance label.
Most of the label_replace use is to clean up any ports or FQDN syntax to normalize the instance value (of different exporters). There still are some uses of exported_instance, which should not exist due to t honor labels, right?

2) When re-enabling the perf_counters (mgr/prometheus/exclude_perf_counters false)to be returned by the mgr (which were disabled by default in Reef in favor of the ceph-exporter running locally on all nodes)
the following labels metrics are exported with an instance label.

ceph_disk_occupation
ceph_disk_occupation_human

the metadata metrics in contrast use hostname to indicate which host they relate to

ceph_mgr_metadata
ceph_mon_metadata
ceph_osd_metadata
ceph_rgw_metadata

Technically it seems that "hostname" is actually the instance the metric refers to? Why no also use instance here, since honor_labels is already recommended?
If it's preferred to stick with exporting the hostname in the hostname label, does it make sense to label_rewrite them into instance?

In essence I am suggesting to pick one label to convey the host of a metric, the daemon it belongs to.

I know that with the switch to the ceph-exporter ...

Ceph-exporter: Now the performance metrics for Ceph daemons are exported by ceph-exporter, which deploys on each daemon rather than using prometheus exporter. This will reduce performance bottlenecks.

the instance likely is again, the host the metrics are collected from / they relate to.
But as longs as mgr can export the performance metrics I believe further cleanup makes sense.

Actions #1

Updated by Christian Rohmann 2 months ago

There also is the metric ceph_daemon_health_metrics which does not contain or instance (to be used via honor_labels) or hostname at all.

Actions

Also available in: Atom PDF