Project

General

Profile

Bug #58002

mon_max_pg_per_osd is not checked per OSD

Added by Frank Schilder 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Monitoring/Alerting
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The warning for exceeding mon_max_pg_per_osd seems to be triggered only when the average PG count over all OSDs exceeds mon_max_pg_per_osd. This is very confusing and does not really help. A number of users wrote that during operations/maintenance/recovery the cluster stopped working by one OSD hitting the hard limit of mon_max_pg_per_osd*osd_max_pg_per_osd_hard_ratio PGs without having seen a warning. This is particularly problematic for pools with comparably few OSDs but many PGs, for example, file system meta data pools.

The warning should be per OSD. As soon as an OSD has more PGs than mon_max_pg_per_osd, a warning should appear and ceph health detail should report the affected OSDs. This would leave a slack interval from mon_max_pg_per_osd to mon_max_pg_per_osd*osd_max_pg_per_osd_hard_ratio where a warning is shown but the cluster continues to operate. An operator would have time to buy extra disks in good time instead of having a health-err cluster without advance warning.

What users observe is that PGs are stuck peering without any indication as to why, because the average PG count is below the warning threshold but one or more OSDs already exceed the hard limit. Result is a stuck cluster and no hint.

A thread where the hard limit was exceeded is: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AK3ZTRDFLBCA23TOEOJXJNJC3AU264QN
The thread where I observe and report the missing warning is: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/WST6K5A4UQGGISBFGJEZS4HFL2VVWW32

Also available in: Atom PDF