mgr/prometheus: offer ability to disable cache
Offer ability to disable the cache¶
The Prometheus mgr module gathers the data for the whole Ceph cluster. Contrary to how Prometheus' exporters are usually designed, we only have one exporter for Ceph specific data. This increases the time it takes to gather the data, as one host is required to collect all of it.
At some point, we've had so many issues with the time it takes to collect the data (clusters with around 1000 OSDs), that it exceeded the 10 or even 15 seconds scrape interval of Prometheus. Considering that in a high available environment, there's likely to more than one Prometheus instance to scrape data from the Prometheus mgr module, implementing a cache was necessary.
This cache is enabled by default and up to now, cannot be disabled. While using this cache is highly effective in mitigating any issues with collecting the data, it is not strictly required for smaller (or faster) deployments. It actually is better if the cache is not used.
But since the cache was introduced, there hasn't been a possibility to disable it. To ease debugging issues but also be able to permanently disable the cache, we need to implement a switch to turn it off.