Bug #55568: mgr/dashboard: CephPGImbalance alert inaccuracies - Dashboard - Ceph

Actions

Copy link

Bug #55568

open

mgr/dashboard: CephPGImbalance alert inaccuracies

Added by Tatjana Dehler about 2 years ago. Updated 11 months ago.

Status:

Pending Backport

Priority:

Low

Assignee:

Aashish Sharma

Category:

Monitoring

Target version:

% Done:

Source:

Tags:

monitoring, alerts backport_processed

Backport:

quincy, pacific, reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

48525

Crash signature (v1):

Crash signature (v2):

Description

Description of problem¶

The CephPGImbalance might be inaccurate in some specific environments. The CephPGImbalance alert is going to be raised, if the amount of PGs of an OSD is 30% less than the average amount of PGs of all OSDs. The amount of PGs of an OSD is normally related to e.g. number of OSDs and size or class of device. For a well-functioning cluster the Balancer takes care of balancing the PGs.

In some cases e.g. if the cluster consists of a lot of bigger HDDs and some smaller SSDs, the alert might be raised. Although the configuration and balance of the PGs is handled correctly.

As a workaround the alert can be silenced.

There are two possible solution, currently:

Just remove the CephPGImbalance alert
Replace the CephPGImbalance alert by e.g. an alert based on the Balancer score

Environment¶

ceph version string:
Platform (OS/distro/release):
Cluster details (nodes, monitors, OSDs):
Did it happen on a stable environment or after a migration/upgrade?:
Browser used (e.g.: Version 86.0.4240.198 (Official Build) (64-bit)):

How reproducible¶

Steps:

Actual results¶

Please add logs and/or screenshots

Expected results¶

here

Additional info¶

here

Related issues 3 (1 open — 2 closed)

Actions

Copy link

Updated by Ernesto Puerta almost 2 years ago

Status changed from New to Triaged
Priority changed from Normal to Low

Actions

Copy link

Updated by Aashish Sharma over 1 year ago

Status changed from Triaged to Fix Under Review
Backport set to quincy

Actions

Copy link

Updated by Aashish Sharma over 1 year ago

Backport changed from quincy to quincy, pacific

Actions

Copy link

Updated by Aashish Sharma over 1 year ago

Status changed from Fix Under Review to Pending Backport
Pull request ID set to 48525

Actions

Copy link

Updated by Backport Bot over 1 year ago

Copied to Backport #58300: quincy: mgr/dashboard: CephPGImbalance alert inaccuracies added

Actions

Copy link

Updated by Backport Bot over 1 year ago

Copied to Backport #58301: pacific: mgr/dashboard: CephPGImbalance alert inaccuracies added

Actions

Copy link

Updated by Backport Bot over 1 year ago

Tags changed from monitoring, alerts to monitoring, alerts backport_processed

Actions

Copy link

Updated by Nizamudeen A about 1 year ago

Backport changed from quincy, pacific to quincy, pacific, reef

Actions

Copy link

Updated by Nizamudeen A about 1 year ago

Tags changed from monitoring, alerts backport_processed to monitoring, alerts

Actions

Copy link

#10

Updated by Backport Bot about 1 year ago

Copied to Backport #59572: reef: mgr/dashboard: CephPGImbalance alert inaccuracies added

Actions

Copy link

#11

Updated by Backport Bot about 1 year ago

Tags changed from monitoring, alerts to monitoring, alerts backport_processed

Actions

Copy link

#12

Updated by Chris Boot 11 months ago

I'm somewhat dismayed that the attached PR was merged to "fix" this issue. If I'm reading it correctly, it simply papers over the problem by hiding the firing alert in the Ceph Dashboard if the balancer is running and happy - but the alert will still be firing in Alertmanager, just invisible.

Anyone who configures Alertmanager to notify them of issues outside Ceph Dashboard will be faced with an alert for a Ceph problem that's apparently invisible via Ceph Dashboard.

This also ignores the fact the Alertmanager will likely be used by more than just Ceph, especially in Kubernetes clusters with Rook for example.

Please consider reverting this PR rather than backporting to other releases.

Actions

Copy link

#13

Updated by Aashish Sharma 11 months ago

Thanks Chris, I am working on the fix..Will do the needfull

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr » Dashboard

Custom queries

Bug #55568

mgr/dashboard: CephPGImbalance alert inaccuracies

Description of problem¶

Environment¶

How reproducible¶

Actual results¶

Expected results¶

Additional info¶

Updated by Ernesto Puerta almost 2 years ago

Updated by Aashish Sharma over 1 year ago

Updated by Aashish Sharma over 1 year ago

Updated by Aashish Sharma over 1 year ago

Updated by Backport Bot over 1 year ago

Updated by Backport Bot over 1 year ago

Updated by Backport Bot over 1 year ago

Updated by Nizamudeen A about 1 year ago

Updated by Nizamudeen A about 1 year ago

Updated by Backport Bot about 1 year ago

Updated by Backport Bot about 1 year ago

Updated by Chris Boot 11 months ago

Updated by Aashish Sharma 11 months ago