mgr/dashboard: Need a method to check references to Grafana dashboards are correct or exist
Ceph Dashboard embeds Grafana dashboards by specifying uid in <cd-grafana> components.
If a Grafana dashboard is updated with uid changed, uid property in corresponding <cd-grafana> component should also be updated.
We need a method to ensure these references are correct, or at least exist.
In the short term, a script can be added to check a <cd-grafana> always refer to a existing Grafana dashboard.
In the long term, we can add some e2e tests to verify correctness of embedded dashboards.
#4 Updated by Kiefer Chang 6 months ago
- Status changed from New to In Progress
Short term solution: check every <cd-grafana> component has a mapped Grafana dashboard
My proposals are below:
A script is created to check mappings (https://github.com/bk201/ceph/blob/5cc01a2b91c7eb1950932cf24a69e7aed4a713f3/src/pybind/mgr/dashboard/tools/check_grafana_references.py):
Running this script to detect if there is any mismatch between two parties:
$ cd /ceph/src/pybind/mgr/dashboard $ python tools/check_grafana_references.py frontend/src/app ../../../../monitoring/grafana Extract <cd-grafana> components and check UIDs Found mappings: -xyV8KCiz (frontend/src/app/ceph/pool/pool-details/pool-details.component.html:14) -> Ceph Pool Details (../../../../monitoring/grafana/dashboards/pool-detail.json) 41FrpeUiz (frontend/src/app/ceph/block/rbd-images/rbd-images.component.html:15) -> RBD Overview (../../../../monitoring/grafana/dashboards/rbd-overview.json) tbO9LAiZz (frontend/src/app/ceph/cephfs/cephfs-detail/cephfs-detail.component.html:46) -> MDS Performance (../../../../monitoring/grafana/dashboards/cephfs-overview.json) WAkugZpiz (frontend/src/app/ceph/rgw/rgw-daemon-list/rgw-daemon-list.component.html:17) -> RGW Overview (../../../../monitoring/grafana/dashboards/radosgw-overview.json) x5ARzZtmk (frontend/src/app/ceph/rgw/rgw-daemon-details/rgw-daemon-details.component.html:17) -> RGW Instance Detail (../../../../monitoring/grafana/dashboards/radosgw-detail.json) y0KGL0iZz (frontend/src/app/ceph/cluster/hosts/hosts.component.html:30) -> Host Overview (../../../../monitoring/grafana/dashboards/hosts-overview.json) rtOg0AiWz (frontend/src/app/ceph/cluster/hosts/host-details/host-details.component.html:4) -> Host Details (../../../../monitoring/grafana/dashboards/host-details.json) CrAHE0iZz (frontend/src/app/ceph/cluster/osd/osd-details/osd-details.component.html:47) -> OSD device details (../../../../monitoring/grafana/dashboards/osd-device-details.json) lo02I1Aiz (frontend/src/app/ceph/cluster/osd/osd-list/osd-list.component.html:72) -> OSD Overview (../../../../monitoring/grafana/dashboards/osds-overview.json) Components that have no mapped Grafana dashboards: z99hzWtmh (frontend/src/app/ceph/pool/pool-list/pool-list.component.html:36) Checking Grafana dashboards UIDs: ERRORApproach 2 (WIP, suggested by Kanika)
- <cd-grafana> component does not refer uid directly.
- Add a new input property `grafana-dashboard-name`. uid of Grafana dashboard is resolved by referring to another mapping file during initialization of component.
- The mapping file contains a object that maps from all dashboard names to Grafana uids.
- A script is created to parse the mapping file and check if Grafana dashboards can be found by uids.
- All mappings are consolidated in a mapping file.
- If Grafana dashboards are updated, a developer only needs to update this mapping file. No need to jumping around for all <cd-grafana> components.
#5 Updated by Lenz Grimmer 6 months ago
Thanks for your proposal, much appreciated. If your first approach can be integrated with "make check", this would be a good first step in making sure we're not running into the same issue again. The second option is much cleaner and would be my preference - depending on how long it would take to implement these changes, we should consider taking that approach right away.
#10 Updated by Kiefer Chang 4 months ago
Lenz Grimmer wrote:
A PR to implement approach 1 has been merged now. Should this one be backported to Nautilus, to capture any regressions?
If we always cherry-pick changes from master back to stable branches, we can say the regression possibility is low because the check is already done on master.
I can help with backporting this.
Do we still plan to implement approach 2 ?
I'd suggest yes, should I create a new issue?