Bug #45554: mgr/prometheus: cache ineffective when gathering data takes longer than 5 seconds - mgr - Ceph

Actions

Copy link

Bug #45554

closed

mgr/prometheus: cache ineffective when gathering data takes longer than 5 seconds

Added by Patrick Seidensal almost 4 years ago. Updated almost 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Patrick Seidensal

Category:

prometheus module

Target version:

% Done:

Source:

Tags:

monitoring

Backport:

octopus,nautilus,luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

35572

Crash signature (v1):

Crash signature (v2):

Description

The cache is considered stale and will not be used if the request takes longer than 5 seconds. This interval can be configured, though.

This is seemingly not a problem for small clusters, but becomes a problem when the request takes longer than 5 seconds and may even become a bigger problem when the gathering of the data takes longer than 10 seconds, as Prometheus requests metrics every 10 seconds (cephadm default configuration). That may lead to requests to the Ceph cluster that will never stop and keep it busy. The data returned will not even be used but Prometheus will (cephadm default configuration) let it time out.

It'd surely make sense to increase the scrape interval in such scenarios, though, the effectively disabled cache will cause even more issues if the metrics are requests in addition to the recurring requests of Prometheus.

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Patrick Seidensal almost 4 years ago

Tags set to monitoring

Actions

Copy link

Updated by Patrick Seidensal almost 4 years ago

Status changed from New to In Progress
Assignee set to Patrick Seidensal

Actions

Copy link

Updated by Patrick Seidensal almost 4 years ago

Status changed from In Progress to Fix Under Review
Pull request ID set to 35572

Actions

Copy link

Updated by Kefu Chai almost 4 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Nathan Cutler almost 4 years ago

Copied to Backport #46171: octopus: mgr/prometheus: cache ineffective when gathering data takes longer than 5 seconds added

Actions

Copy link

Updated by Nathan Cutler almost 4 years ago

Copied to Backport #46172: nautilus: mgr/prometheus: cache ineffective when gathering data takes longer than 5 seconds added

Actions

Copy link

Updated by Patrick Seidensal almost 4 years ago

Backport changed from octopus,nautilus to octopus,nautilus,luminous

Actions

Copy link

Updated by Laura Paduano almost 4 years ago

Copied to Backport #46544: luminous: mgr/prometheus: cache ineffective when gathering data takes longer than 5 seconds added

Actions

Copy link

Updated by Patrick Seidensal almost 4 years ago

Backport changed from octopus,nautilus,luminous to octopus,nautilus

Actions

Copy link

#10

Updated by Nathan Cutler almost 4 years ago

Backport changed from octopus,nautilus to octopus,nautilus,luminous

Actions

Copy link

#11

Updated by Nathan Cutler almost 4 years ago

@Patrick, when a backport issue enters Rejected state, we should not remove it from the Backports list, otherwise the "backport-create-issue" script complains:

ERROR:root:https://tracker.ceph.com/issues/45554 has more backport issues (luminous,nautilus,octopus) than expected (nautilus,octopus)

Actions

Copy link

#12

Updated by Nathan Cutler almost 4 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Custom queries

Bug #45554

mgr/prometheus: cache ineffective when gathering data takes longer than 5 seconds

Updated by Patrick Seidensal almost 4 years ago

Updated by Patrick Seidensal almost 4 years ago

Updated by Patrick Seidensal almost 4 years ago

Updated by Kefu Chai almost 4 years ago

Updated by Nathan Cutler almost 4 years ago

Updated by Nathan Cutler almost 4 years ago

Updated by Patrick Seidensal almost 4 years ago

Updated by Laura Paduano almost 4 years ago

Updated by Patrick Seidensal almost 4 years ago

Updated by Nathan Cutler almost 4 years ago

Updated by Nathan Cutler almost 4 years ago

Updated by Nathan Cutler almost 4 years ago