Project

General

Profile

Actions

Bug #64119

open

During OSD recovery, performance stats reported by mgr/prometheus are bogus

Added by Paul Cuzner 4 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
prometheus module
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During an OSD recovery period the pool stats can show IOPS and Throughput numbers which do not reflect the state of the systemn.

For example, I've seen IOPS at 175,592,917 and throughput at 9.39TiB/s!

mgr/prometheus us calling the df mgr interface for these stats which appears to call a different internal function (pg_map.dump_pool_stats_full) when compared to the mgr osd_pool_stats call which uses pg_map.dump_pool_stats_and_io_rate

Whilst this is not a major issue, it does pollute any monitoring making the return to normal I/O rate difficult to see within the dashboard and grafana monitoring.


Files

prometheus_pool_default.rgw_log_iops_bogus.jpg (152 KB) prometheus_pool_default.rgw_log_iops_bogus.jpg prometheus graph - bogus iops for default.rgw.log pool Prashant D, 01/30/2024 10:30 PM
graphana_prometheus_pool_default.rgw_log_iops_bogus.jpg (158 KB) graphana_prometheus_pool_default.rgw_log_iops_bogus.jpg graphana dashboard - bogus iops for default.rgw.log pool Prashant D, 01/30/2024 10:34 PM
Actions

Also available in: Atom PDF