Bug #64321
openmgr/dashboard: dashboards and alerts from ceph-mixins not fully compatible with showMultiCluster=true (multiple Ceph clusters some Prometheus instance)
0%
Description
Description of problem¶
The ceph-mixins allow for dashboards and alerts to be made compatible with metrics of multiple Ceph clusters being stored in the same Prometheus instance. This can be achieved via the settings
clusterLabel: 'cluster', showMultiCluster: true,
inside of https://github.com/ceph/ceph/blob/main/monitoring/ceph-mixin/config.libsonnet and then recompiling the dashboards and alerts.
Environment¶
ceph version
string: ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)- Platform (OS/distro/release): Ubuntu 20.04 (Jammy)
How reproducible¶
Steps:
- set
showMultiCluster
to true - run
make generate
- check out
dashboards_out
andprometheus_alerts.yml
in regards tocluster
label used consistently to allow for individual clusters to be targeted and to tolerate metrics for multiple clusters be stored in the same Prometheus instance
- some tests (
make test
) seem to also fail when the showMultiCluster option is enabled. Maybe testing of them is not properly implemented at all?
Actual results¶
Some queries don't filter on cluster label, so metrics of multiple clusters are returned. This results dashboards showing metrics of multiple clusters in the same graphs or, in case of joins, label collisions occur due to the same label and value e.g. ceph_daemon="osd.0" being present multiple times (from different clusters). For alerts using joins collisions cause them to not be evaluated. The cluster name is not mentioned consistently in the description or summary.
Expected results¶
After selecting a cluster in the template Grafana only metrics for the same Ceph cluster are shown.
For alerts I expect them to work for a single Prometheus instance hosting the metrics for multuple Ceph clusters.
Additional info¶
There seems to be also some inconsistencies related to the "style" of dealing with the instance
label (vs. hostname).
I raised another bug about that one in general - https://tracker.ceph.com/issues/64288
Updated by Christian Rohmann 3 months ago
I pushed a PR - https://github.com/ceph/ceph/pull/55495
Updated by Aashish Sharma 2 days ago
- Status changed from New to Pending Backport
- Backport set to squid,reef
- Pull request ID set to 55495