Bug #51131: prometheus stats missing since upgrade to octopus 15.2.13 - mgr - Ceph

Actions

Copy link

Bug #51131

open

prometheus stats missing since upgrade to octopus 15.2.13

Added by Marcel Kuiper almost 3 years ago. Updated almost 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Paul Cuzner

Category:

prometheus module

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.13

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I recently upgraded one of my clusters from nautilus 14.2.21 on ubuntu to octopus 15.2.13. Since then
I do not get prometheus metrics anymore for some ceph_pg_* counters. A curl http://mgr:9283/metrics shows this for the missing data (some lines above and below kept for context)

ceph_pg_clean{pool_id="5"} 64.0
ceph_pg_clean{pool_id="9"} 64.0

HELP ceph_pg_down PG down per pool
TYPE ceph_pg_down gauge
HELP ceph_pg_recovery_unfound PG recovery_unfound per pool
TYPE ceph_pg_recovery_unfound gauge
HELP ceph_pg_backfill_unfound PG backfill_unfound per pool
TYPE ceph_pg_backfill_unfound gauge
HELP ceph_pg_scrubbing PG scrubbing per pool
TYPE ceph_pg_scrubbing gauge
HELP ceph_pg_degraded PG degraded per pool
TYPE ceph_pg_degraded gauge
HELP ceph_pg_inconsistent PG inconsistent per pool
TYPE ceph_pg_inconsistent gauge
HELP ceph_pg_peering PG peering per pool
TYPE ceph_pg_peering gauge
HELP ceph_pg_repair PG repair per pool
TYPE ceph_pg_repair gauge
HELP ceph_pg_recovering PG recovering per pool
TYPE ceph_pg_recovering gauge
HELP ceph_pg_forced_recovery PG forced_recovery per pool
TYPE ceph_pg_forced_recovery gauge
HELP ceph_pg_backfill_wait PG backfill_wait per pool
TYPE ceph_pg_backfill_wait gauge
HELP ceph_pg_incomplete PG incomplete per pool
TYPE ceph_pg_incomplete gauge
HELP ceph_pg_stale PG stale per pool
TYPE ceph_pg_stale gauge
HELP ceph_pg_remapped PG remapped per pool
TYPE ceph_pg_remapped gauge
HELP ceph_pg_deep PG deep per pool
TYPE ceph_pg_deep gauge
HELP ceph_pg_backfilling PG backfilling per pool
TYPE ceph_pg_backfilling gauge
HELP ceph_pg_forced_backfill PG forced_backfill per pool
TYPE ceph_pg_forced_backfill gauge
HELP ceph_pg_backfill_toofull PG backfill_toofull per pool
TYPE ceph_pg_backfill_toofull gauge
HELP ceph_pg_recovery_wait PG recovery_wait per pool
TYPE ceph_pg_recovery_wait gauge
HELP ceph_pg_recovery_toofull PG recovery_toofull per pool
TYPE ceph_pg_recovery_toofull gauge
HELP ceph_pg_undersized PG undersized per pool
TYPE ceph_pg_undersized gauge
HELP ceph_pg_activating PG activating per pool
TYPE ceph_pg_activating gauge
HELP ceph_pg_peered PG peered per pool
TYPE ceph_pg_peered gauge
HELP ceph_pg_snaptrim PG snaptrim per pool
TYPE ceph_pg_snaptrim gauge
HELP ceph_pg_snaptrim_wait PG snaptrim_wait per pool
TYPE ceph_pg_snaptrim_wait gauge
HELP ceph_pg_snaptrim_error PG snaptrim_error per pool
TYPE ceph_pg_snaptrim_error gauge
HELP ceph_pg_creating PG creating per pool
TYPE ceph_pg_creating gauge
HELP ceph_pg_unknown PG unknown per pool
TYPE ceph_pg_unknown gauge
HELP ceph_pg_premerge PG premerge per pool
TYPE ceph_pg_premerge gauge
HELP ceph_pg_failed_repair PG failed_repair per pool
TYPE ceph_pg_failed_repair gauge
HELP ceph_pg_laggy PG laggy per pool
TYPE ceph_pg_laggy gauge
HELP ceph_pg_wait PG wait per pool
TYPE ceph_pg_wait gauge
HELP ceph_cluster_total_bytes DF total_bytes
TYPE ceph_cluster_total_bytes gauge
ceph_cluster_total_bytes 232986825752576.0
HELP ceph_cluster_total_used_bytes DF total_used_bytes
TYPE ceph_cluster_total_used_bytes gauge
ceph_cluster_total_used_bytes 9457124081664.0

Actions

Copy link

Updated by Loïc Dachary almost 3 years ago

Target version deleted (~~v15.2.13~~)
Affected Versions v15.2.13 added

Actions

Copy link

Updated by Neha Ojha almost 3 years ago

Project changed from Ceph to mgr
Category set to prometheus module

Actions

Copy link

Updated by Peter Razumovsky over 2 years ago

Any progress? Same for me on ceph v15.2.13.

Actions

Copy link

Updated by Neha Ojha over 2 years ago

Assignee set to Paul Cuzner

Hey Paul, I am assigning this you, in case you have any ideas on what's going on here.

Actions

Copy link

Updated by Paul Cuzner about 2 years ago

I saw this in pacific too. I think the values if zero are no longer emitted e.g if there isn't any peering going on pg_peering will not be seen, but as soon as it is, it's present. It doesn't present a problem for alerts AFAIK

Actions

Copy link

Updated by Peter Razumovsky about 2 years ago

we still facing with this issue on v15.2.13 with our pre-defined alerts:

----------------------------- Captured stdout call -----------------------------
[INFO]:
Checking metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" 
[WARNING]: Metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" not found
[INFO]: Checking that CephPGInconsistent alert is firing
----------------------------- Captured stdout call -----------------------------
[INFO]:
Checking metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" 
[WARNING]: Metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" not found
[INFO]: Checking that CephPGInconsistent alert is firing
----------------------------- Captured stdout call -----------------------------
[INFO]:
Checking metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" 
[WARNING]: Metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" not found
[INFO]: Checking that CephPGInconsistent alert is firing

Actions

Copy link

Updated by Peter Razumovsky almost 2 years ago

we are still facing this issue. Any updates?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Custom queries

Bug #51131

prometheus stats missing since upgrade to octopus 15.2.13

Updated by Loïc Dachary almost 3 years ago

Updated by Neha Ojha almost 3 years ago

Updated by Peter Razumovsky over 2 years ago

Updated by Neha Ojha over 2 years ago

Updated by Paul Cuzner about 2 years ago

Updated by Peter Razumovsky about 2 years ago

Updated by Peter Razumovsky almost 2 years ago