Bug #55568
mgr/dashboard: CephPGImbalance alert inaccuracies
0%
Description
Description of problem¶
The CephPGImbalance might be inaccurate in some specific environments. The CephPGImbalance alert is going to be raised, if the amount of PGs of an OSD is 30% less than the average amount of PGs of all OSDs. The amount of PGs of an OSD is normally related to e.g. number of OSDs and size or class of device. For a well-functioning cluster the Balancer takes care of balancing the PGs.
In some cases e.g. if the cluster consists of a lot of bigger HDDs and some smaller SSDs, the alert might be raised. Although the configuration and balance of the PGs is handled correctly.
As a workaround the alert can be silenced.
There are two possible solution, currently:
- Just remove the CephPGImbalance alert
- Replace the CephPGImbalance alert by e.g. an alert based on the Balancer score
Environment¶
ceph version
string:- Platform (OS/distro/release):
- Cluster details (nodes, monitors, OSDs):
- Did it happen on a stable environment or after a migration/upgrade?:
- Browser used (e.g.:
Version 86.0.4240.198 (Official Build) (64-bit)
):
How reproducible¶
Steps:
- ...
Actual results¶
Please add logs and/or screenshots
Expected results¶
here
Additional info¶
here
Related issues
History
#1 Updated by Ernesto Puerta over 1 year ago
- Status changed from New to Triaged
- Priority changed from Normal to Low
#2 Updated by Aashish Sharma 9 months ago
- Status changed from Triaged to Fix Under Review
- Backport set to quincy
#3 Updated by Aashish Sharma 9 months ago
- Backport changed from quincy to quincy, pacific
#4 Updated by Aashish Sharma 9 months ago
- Status changed from Fix Under Review to Pending Backport
- Pull request ID set to 48525
#5 Updated by Backport Bot 9 months ago
- Copied to Backport #58300: quincy: mgr/dashboard: CephPGImbalance alert inaccuracies added
#6 Updated by Backport Bot 9 months ago
- Copied to Backport #58301: pacific: mgr/dashboard: CephPGImbalance alert inaccuracies added
#7 Updated by Backport Bot 9 months ago
- Tags changed from monitoring, alerts to monitoring, alerts backport_processed
#8 Updated by Nizamudeen A 5 months ago
- Backport changed from quincy, pacific to quincy, pacific, reef
#9 Updated by Nizamudeen A 5 months ago
- Tags changed from monitoring, alerts backport_processed to monitoring, alerts
#10 Updated by Backport Bot 5 months ago
- Copied to Backport #59572: reef: mgr/dashboard: CephPGImbalance alert inaccuracies added
#11 Updated by Backport Bot 5 months ago
- Tags changed from monitoring, alerts to monitoring, alerts backport_processed
#12 Updated by Chris Boot 4 months ago
I'm somewhat dismayed that the attached PR was merged to "fix" this issue. If I'm reading it correctly, it simply papers over the problem by hiding the firing alert in the Ceph Dashboard if the balancer is running and happy - but the alert will still be firing in Alertmanager, just invisible.
Anyone who configures Alertmanager to notify them of issues outside Ceph Dashboard will be faced with an alert for a Ceph problem that's apparently invisible via Ceph Dashboard.
This also ignores the fact the Alertmanager will likely be used by more than just Ceph, especially in Kubernetes clusters with Rook for example.
Please consider reverting this PR rather than backporting to other releases.
#13 Updated by Aashish Sharma 3 months ago
Thanks Chris, I am working on the fix..Will do the needfull