Project

General

Profile

Actions

Bug #45554

closed

mgr/prometheus: cache ineffective when gathering data takes longer than 5 seconds

Added by Patrick Seidensal almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Category:
prometheus module
Target version:
-
% Done:

0%

Source:
Tags:
monitoring
Backport:
octopus,nautilus,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The cache is considered stale and will not be used if the request takes longer than 5 seconds. This interval can be configured, though.

This is seemingly not a problem for small clusters, but becomes a problem when the request takes longer than 5 seconds and may even become a bigger problem when the gathering of the data takes longer than 10 seconds, as Prometheus requests metrics every 10 seconds (cephadm default configuration). That may lead to requests to the Ceph cluster that will never stop and keep it busy. The data returned will not even be used but Prometheus will (cephadm default configuration) let it time out.

It'd surely make sense to increase the scrape interval in such scenarios, though, the effectively disabled cache will cause even more issues if the metrics are requests in addition to the recurring requests of Prometheus.


Related issues 3 (0 open3 closed)

Copied to mgr - Backport #46171: octopus: mgr/prometheus: cache ineffective when gathering data takes longer than 5 secondsResolvedLaura PaduanoActions
Copied to mgr - Backport #46172: nautilus: mgr/prometheus: cache ineffective when gathering data takes longer than 5 secondsResolvedLaura PaduanoActions
Copied to mgr - Backport #46544: luminous: mgr/prometheus: cache ineffective when gathering data takes longer than 5 secondsRejectedActions
Actions

Also available in: Atom PDF