Project

General

Profile

Bug #56514

Updated by Ernesto Puerta over 1 year ago

This was an unknown unknown: since we didn't expect this to become such a massive endpoint. The thing is that some alerts are triggered PER daemon, so issues affecting OSDs will trigger x number_of_OSDs (8k entries for an 8k OSD cluster with 1 alert... but if an OSD triggers 2,3 alerts, then than number will multiply by 2, 3 to 16, 24k items).

Target cout: 32k alerts.

Strategy: Prometheus Alerts API (https://prometheus.io/docs/prometheus/latest/querying/api/#alerts).

From Pawsey:
* "The generic table does work that well across all the use cases. For example in the monitoring it would make sense to have a checkboxes for active/suppressed for example. As it stands you have an overlay number on the monitoring menu item indicating the number of issues, but when you look at the alerts you see suppressed as well!"

Back