Project

General

Profile

Actions

Bug #51131

open

prometheus stats missing since upgrade to octopus 15.2.13

Added by Marcel Kuiper almost 3 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
prometheus module
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I recently upgraded one of my clusters from nautilus 14.2.21 on ubuntu to octopus 15.2.13. Since then
I do not get prometheus metrics anymore for some ceph_pg_* counters. A curl http://mgr:9283/metrics shows this for the missing data (some lines above and below kept for context)

ceph_pg_clean{pool_id="5"} 64.0
ceph_pg_clean{pool_id="9"} 64.0
  1. HELP ceph_pg_down PG down per pool
  2. TYPE ceph_pg_down gauge
  3. HELP ceph_pg_recovery_unfound PG recovery_unfound per pool
  4. TYPE ceph_pg_recovery_unfound gauge
  5. HELP ceph_pg_backfill_unfound PG backfill_unfound per pool
  6. TYPE ceph_pg_backfill_unfound gauge
  7. HELP ceph_pg_scrubbing PG scrubbing per pool
  8. TYPE ceph_pg_scrubbing gauge
  9. HELP ceph_pg_degraded PG degraded per pool
  10. TYPE ceph_pg_degraded gauge
  11. HELP ceph_pg_inconsistent PG inconsistent per pool
  12. TYPE ceph_pg_inconsistent gauge
  13. HELP ceph_pg_peering PG peering per pool
  14. TYPE ceph_pg_peering gauge
  15. HELP ceph_pg_repair PG repair per pool
  16. TYPE ceph_pg_repair gauge
  17. HELP ceph_pg_recovering PG recovering per pool
  18. TYPE ceph_pg_recovering gauge
  19. HELP ceph_pg_forced_recovery PG forced_recovery per pool
  20. TYPE ceph_pg_forced_recovery gauge
  21. HELP ceph_pg_backfill_wait PG backfill_wait per pool
  22. TYPE ceph_pg_backfill_wait gauge
  23. HELP ceph_pg_incomplete PG incomplete per pool
  24. TYPE ceph_pg_incomplete gauge
  25. HELP ceph_pg_stale PG stale per pool
  26. TYPE ceph_pg_stale gauge
  27. HELP ceph_pg_remapped PG remapped per pool
  28. TYPE ceph_pg_remapped gauge
  29. HELP ceph_pg_deep PG deep per pool
  30. TYPE ceph_pg_deep gauge
  31. HELP ceph_pg_backfilling PG backfilling per pool
  32. TYPE ceph_pg_backfilling gauge
  33. HELP ceph_pg_forced_backfill PG forced_backfill per pool
  34. TYPE ceph_pg_forced_backfill gauge
  35. HELP ceph_pg_backfill_toofull PG backfill_toofull per pool
  36. TYPE ceph_pg_backfill_toofull gauge
  37. HELP ceph_pg_recovery_wait PG recovery_wait per pool
  38. TYPE ceph_pg_recovery_wait gauge
  39. HELP ceph_pg_recovery_toofull PG recovery_toofull per pool
  40. TYPE ceph_pg_recovery_toofull gauge
  41. HELP ceph_pg_undersized PG undersized per pool
  42. TYPE ceph_pg_undersized gauge
  43. HELP ceph_pg_activating PG activating per pool
  44. TYPE ceph_pg_activating gauge
  45. HELP ceph_pg_peered PG peered per pool
  46. TYPE ceph_pg_peered gauge
  47. HELP ceph_pg_snaptrim PG snaptrim per pool
  48. TYPE ceph_pg_snaptrim gauge
  49. HELP ceph_pg_snaptrim_wait PG snaptrim_wait per pool
  50. TYPE ceph_pg_snaptrim_wait gauge
  51. HELP ceph_pg_snaptrim_error PG snaptrim_error per pool
  52. TYPE ceph_pg_snaptrim_error gauge
  53. HELP ceph_pg_creating PG creating per pool
  54. TYPE ceph_pg_creating gauge
  55. HELP ceph_pg_unknown PG unknown per pool
  56. TYPE ceph_pg_unknown gauge
  57. HELP ceph_pg_premerge PG premerge per pool
  58. TYPE ceph_pg_premerge gauge
  59. HELP ceph_pg_failed_repair PG failed_repair per pool
  60. TYPE ceph_pg_failed_repair gauge
  61. HELP ceph_pg_laggy PG laggy per pool
  62. TYPE ceph_pg_laggy gauge
  63. HELP ceph_pg_wait PG wait per pool
  64. TYPE ceph_pg_wait gauge
  65. HELP ceph_cluster_total_bytes DF total_bytes
  66. TYPE ceph_cluster_total_bytes gauge
    ceph_cluster_total_bytes 232986825752576.0
  67. HELP ceph_cluster_total_used_bytes DF total_used_bytes
  68. TYPE ceph_cluster_total_used_bytes gauge
    ceph_cluster_total_used_bytes 9457124081664.0
Actions #1

Updated by Loïc Dachary almost 3 years ago

  • Target version deleted (v15.2.13)
  • Affected Versions v15.2.13 added
Actions #2

Updated by Neha Ojha almost 3 years ago

  • Project changed from Ceph to mgr
  • Category set to prometheus module
Actions #3

Updated by Peter Razumovsky over 2 years ago

Any progress? Same for me on ceph v15.2.13.

Actions #4

Updated by Neha Ojha over 2 years ago

  • Assignee set to Paul Cuzner

Hey Paul, I am assigning this you, in case you have any ideas on what's going on here.

Actions #5

Updated by Paul Cuzner about 2 years ago

I saw this in pacific too. I think the values if zero are no longer emitted e.g if there isn't any peering going on pg_peering will not be seen, but as soon as it is, it's present. It doesn't present a problem for alerts AFAIK

Actions #6

Updated by Peter Razumovsky about 2 years ago

we still facing with this issue on v15.2.13 with our pre-defined alerts:

----------------------------- Captured stdout call -----------------------------
[INFO]:
Checking metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" 
[WARNING]: Metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" not found
[INFO]: Checking that CephPGInconsistent alert is firing
----------------------------- Captured stdout call -----------------------------
[INFO]:
Checking metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" 
[WARNING]: Metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" not found
[INFO]: Checking that CephPGInconsistent alert is firing
----------------------------- Captured stdout call -----------------------------
[INFO]:
Checking metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" 
[WARNING]: Metric/expression "sum by(rook_cluster, name) (ceph_pg_inconsistent * on(pool_id) group_right() ceph_pool_metadata) <= 0" not found
[INFO]: Checking that CephPGInconsistent alert is firing
Actions #7

Updated by Peter Razumovsky almost 2 years ago

we are still facing this issue. Any updates?

Actions

Also available in: Atom PDF