Bug #53190: counter num_read_kb is going down - RADOS - Ceph

Actions

Copy link

Bug #53190

open

counter num_read_kb is going down

Added by Patrick Seidensal over 2 years ago. Updated over 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

monitoring

Backport:

octopus pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.0.0

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Description of problem¶

An unreasonably high read metric value has been reported by monitoring (28.76TB/s).

This is due to Ceph providing a value for `num_read_kb` which has decreased. Prometheus treats that as a counter reset, assuming that the value is to be added to the previously collected one and thereby producing the reported unreasonably high value.

We've been able to verify that this is not an issue of the mgr/prometheus module but a value that comes from Ceph, however, we do not know how it is reproduced.

pg-dump.2021-09-14T18:26:50+01:00 716138503663,
pg-dump.2021-09-14T18:27:03+01:00 716138539210,
pg-dump.2021-09-14T18:27:16+01:00 716138564623,
pg-dump.2021-09-14T18:27:28+01:00 716137750423, <- 1631640448 (epoch)
pg-dump.2021-09-14T18:27:41+01:00 716137808867,
pg-dump.2021-09-14T18:27:53+01:00 716137862127,

Environment¶

ceph version string: Octopus

How reproducible¶

No reproducer available at this point.

Actual results¶

Counter has decreased.

Expected results¶

Counter is only ever increased.

Additional info¶

This is an issue we've been able to see repeatably. However, we unfortunately do not know how to reproduce the issue and currently do not have access to the cluster which has been producing these values.

Actions

Copy link

Updated by Josh Durgin over 2 years ago

This seems possible to occur for many such counters in a distributed system like ceph, where these values are not treated monotonically. Is there a way to report these to prometheus that accepts decreasing values?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #53190

counter num_read_kb is going down

Description of problem¶

Environment¶

How reproducible¶

Actual results¶

Expected results¶

Additional info¶

Updated by Josh Durgin over 2 years ago