Project

General

Profile

Actions

Bug #55568

open

mgr/dashboard: CephPGImbalance alert inaccuracies

Added by Tatjana Dehler about 2 years ago. Updated 11 months ago.

Status:
Pending Backport
Priority:
Low
Category:
Monitoring
Target version:
-
% Done:

0%

Source:
Tags:
monitoring, alerts backport_processed
Backport:
quincy, pacific, reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description of problem

The CephPGImbalance might be inaccurate in some specific environments. The CephPGImbalance alert is going to be raised, if the amount of PGs of an OSD is 30% less than the average amount of PGs of all OSDs. The amount of PGs of an OSD is normally related to e.g. number of OSDs and size or class of device. For a well-functioning cluster the Balancer takes care of balancing the PGs.

In some cases e.g. if the cluster consists of a lot of bigger HDDs and some smaller SSDs, the alert might be raised. Although the configuration and balance of the PGs is handled correctly.

As a workaround the alert can be silenced.

There are two possible solution, currently:

  • Just remove the CephPGImbalance alert
  • Replace the CephPGImbalance alert by e.g. an alert based on the Balancer score

Environment

  • ceph version string:
  • Platform (OS/distro/release):
  • Cluster details (nodes, monitors, OSDs):
  • Did it happen on a stable environment or after a migration/upgrade?:
  • Browser used (e.g.: Version 86.0.4240.198 (Official Build) (64-bit)):

How reproducible

Steps:

  1. ...

Actual results

Please add logs and/or screenshots

Expected results

here

Additional info

here


Related issues 3 (1 open2 closed)

Copied to Dashboard - Backport #58300: quincy: mgr/dashboard: CephPGImbalance alert inaccuraciesResolvedAashish SharmaActions
Copied to Dashboard - Backport #58301: pacific: mgr/dashboard: CephPGImbalance alert inaccuraciesResolvedAashish SharmaActions
Copied to Dashboard - Backport #59572: reef: mgr/dashboard: CephPGImbalance alert inaccuraciesNewAashish SharmaActions
Actions #1

Updated by Ernesto Puerta almost 2 years ago

  • Status changed from New to Triaged
  • Priority changed from Normal to Low
Actions #2

Updated by Aashish Sharma over 1 year ago

  • Status changed from Triaged to Fix Under Review
  • Backport set to quincy
Actions #3

Updated by Aashish Sharma over 1 year ago

  • Backport changed from quincy to quincy, pacific
Actions #4

Updated by Aashish Sharma over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
  • Pull request ID set to 48525
Actions #5

Updated by Backport Bot over 1 year ago

  • Copied to Backport #58300: quincy: mgr/dashboard: CephPGImbalance alert inaccuracies added
Actions #6

Updated by Backport Bot over 1 year ago

  • Copied to Backport #58301: pacific: mgr/dashboard: CephPGImbalance alert inaccuracies added
Actions #7

Updated by Backport Bot over 1 year ago

  • Tags changed from monitoring, alerts to monitoring, alerts backport_processed
Actions #8

Updated by Nizamudeen A about 1 year ago

  • Backport changed from quincy, pacific to quincy, pacific, reef
Actions #9

Updated by Nizamudeen A about 1 year ago

  • Tags changed from monitoring, alerts backport_processed to monitoring, alerts
Actions #10

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59572: reef: mgr/dashboard: CephPGImbalance alert inaccuracies added
Actions #11

Updated by Backport Bot about 1 year ago

  • Tags changed from monitoring, alerts to monitoring, alerts backport_processed
Actions #12

Updated by Chris Boot 11 months ago

I'm somewhat dismayed that the attached PR was merged to "fix" this issue. If I'm reading it correctly, it simply papers over the problem by hiding the firing alert in the Ceph Dashboard if the balancer is running and happy - but the alert will still be firing in Alertmanager, just invisible.

Anyone who configures Alertmanager to notify them of issues outside Ceph Dashboard will be faced with an alert for a Ceph problem that's apparently invisible via Ceph Dashboard.

This also ignores the fact the Alertmanager will likely be used by more than just Ceph, especially in Kubernetes clusters with Rook for example.

Please consider reverting this PR rather than backporting to other releases.

Actions #13

Updated by Aashish Sharma 11 months ago

Thanks Chris, I am working on the fix..Will do the needfull

Actions

Also available in: Atom PDF