Bug #64288
openInconsistent uses of instance and hostname label in Prometheus metrics
0%
Description
While the configuration of (an external) Prometheus server, in regards to the instance
label and also the option honor_labels
is explained at https://docs.ceph.com/en/latest/mgr/prometheus/#configuring-prometheus-server, I still believe things are not fully consistent.
1) I ran into some side-effects when using the ceph-mixings (https://github.com/ceph/ceph/tree/main/monitoring/ceph-mixin), which do join quite a lot on the instance
label.
Most of the label_replace use is to clean up any ports or FQDN syntax to normalize the instance value (of different exporters). There still are some uses of exported_instance
, which should not exist due to t honor labels, right?
2) When re-enabling the perf_counters (mgr/prometheus/exclude_perf_counters false
)to be returned by the mgr (which were disabled by default in Reef in favor of the ceph-exporter running locally on all nodes)
the following labels metrics are exported with an instance
label.
ceph_disk_occupation
ceph_disk_occupation_human
the metadata metrics in contrast use hostname
to indicate which host they relate to
ceph_mgr_metadata
ceph_mon_metadata
ceph_osd_metadata
ceph_rgw_metadata
Technically it seems that "hostname" is actually the instance the metric refers to? Why no also use instance here, since honor_labels is already recommended?
If it's preferred to stick with exporting the hostname
in the hostname
label, does it make sense to label_rewrite them into instance?
In essence I am suggesting to pick one label to convey the host of a metric, the daemon it belongs to.
I know that with the switch to the ceph-exporter ...
Ceph-exporter: Now the performance metrics for Ceph daemons are exported by ceph-exporter, which deploys on each daemon rather than using prometheus exporter. This will reduce performance bottlenecks.
the instance likely is again, the host the metrics are collected from / they relate to.
But as longs as mgr can export the performance metrics I believe further cleanup makes sense.
Updated by Christian Rohmann 3 months ago
There also is the metric ceph_daemon_health_metrics
which does not contain or instance
(to be used via honor_labels) or hostname
at all.