Bug #23167
mgr: prometheus: ceph_pg metrics reported by prometheus plugin inconsistent with "ceph -s" output
0%
Description
This was observed in a cluster running 12.2.4.
When a host went down, the following PG counts were reported by "ceph -s". Around 28939 were reported as "active+clean". The corresponding numbers from the Prometheus plugin
are reported as
ceph_pg_active 326.0
ceph_pg_clean 30379.0
Note that there was a slight delay between the two data points and so some numbers did change. However, Prometheus reports only 326 PGs as active while the number of clean PGs is closer to what "ceph -s" displays. The active count seems to be wrong. When the cluster went back to its normal state the active and clean counts match.
data:
pools: 2 pools, 33280 pgs
objects: 4907 objects, 1265 GB
usage: 3096 GB used, 3926 TB / 3929 TB avail
pgs: 1.388% pgs not active
574/44163 objects degraded (1.300%)
28939 active+clean
3344 active+undersized
512 active+undersized+degraded
376 peering
85 activating
23 active+recovering+degraded
1 activating+degraded
ceph_pg_incomplete 0.0
ceph_pg_degraded 326.0
ceph_pg_forced_backfill 0.0
ceph_pg_stale 0.0
ceph_pg_undersized 326.0
ceph_pg_peering 168.0
ceph_pg_inconsistent 0.0
ceph_pg_forced_recovery 0.0
ceph_pg_creating 0.0
ceph_pg_wait_backfill 0.0
ceph_pg_active 326.0
ceph_pg_deep 0.0
ceph_pg_scrubbing 0.0
ceph_pg_recovering 22.0
ceph_pg_repair 0.0
ceph_pg_down 0.0
ceph_pg_peered 0.0
ceph_pg_backfill 0.0
ceph_pg_clean 30379.0
ceph_pg_remapped 0.0
ceph_pg_backfill_toofull 0.0
Normal state
- - - - - -
data:
pools: 2 pools, 33280 pgs
objects: 5831 objects, 1503 GB
usage: 3451 GB used, 3926 TB / 3929 TB avail
pgs: 33280 active+clean
ceph_pg_incomplete 0.0
ceph_pg_degraded 0.0
ceph_pg_forced_backfill 0.0
ceph_pg_stale 0.0
ceph_pg_undersized 0.0
ceph_pg_peering 0.0
ceph_pg_inconsistent 0.0
ceph_pg_forced_recovery 0.0
ceph_pg_creating 0.0
ceph_pg_wait_backfill 0.0
ceph_pg_active 33280.0
ceph_pg_deep 0.0
ceph_pg_scrubbing 0.0
ceph_pg_recovering 0.0
ceph_pg_repair 0.0
ceph_pg_down 0.0
ceph_pg_peered 0.0
ceph_pg_backfill 0.0
ceph_pg_clean 33280.0
ceph_pg_remapped 0.0
ceph_pg_backfill_toofull 0.0
History
#1 Updated by John Spray about 6 years ago
- Assignee set to Boris Ranto
This is probably the same thing that was fixed in master in this commit:
commit 6cefd4832f59b6196f27769a1ec4934329547da9 Author: Boris Ranto <branto@redhat.com> Date: Fri Feb 16 18:45:58 2018 +0100 mgr/prometheus: Fix pg_* counts Currently, the pg_* counts are not computed properly. We split the current state by '+' sign but do not add the pg count to the already found pg count. Instead, we overwrite any existing pg count with the new count. This patch fixes it by adding all the pg counts together for all the states. It also introduces a new pg_total metric for pg_total that shows the total count of PGs. Signed-off-by: Boris Ranto <branto@redhat.com>
Boris, please could you look at this and backport if necessary?
#2 Updated by Boris Ranto about 6 years ago
Yes, this tracker covers what I have been seeing with the pg metrics. I have included the commit in this prometheus exporter backport PR:
#3 Updated by John Spray about 6 years ago
- Status changed from New to Fix Under Review
#4 Updated by Yuri Weinstein almost 6 years ago
#5 Updated by Nathan Cutler almost 6 years ago
- Status changed from Fix Under Review to Resolved