Bug #61570: pg_autoscaler warns that a pool has too many pgs when it has the exact right amount - mgr - Ceph

Actions

Copy link

Bug #61570

closed

pg_autoscaler warns that a pool has too many pgs when it has the exact right amount

Added by Laura Flores 11 months ago. Updated about 1 month ago.

Status:

Resolved

Priority:

Normal

Assignee:

Kamoltat (Junior) Sirivadhna

Category:

pg_autoscaler module

Target version:

% Done:

100%

Source:

Q/A

Tags:

backport_processed

Backport:

reef, quincy, pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

51923

Crash signature (v1):

Crash signature (v2):

Description

Looks like a possible bug in the autoscaler logic. The autoscaler warns that a pool has too many pgs when it has the exact right amount.

/a/yuriw-2023-05-30_20:25:48-rados-wip-yuri5-testing-2023-05-30-0828-quincy-distro-default-smithi/7290292

2023-05-31T05:16:23.742 DEBUG:teuthology.orchestra.run.smithi148:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v 'but it is still running' | egrep -v 'had wrong client addr' | egrep -v 'had wrong cluster addr' | egrep -v 'must scrub before tier agent can activate' | egrep -v 'failsafe engaged, dropping updates' | egrep -v 'failsafe disengaged, no longer dropping updates' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(SMALLER_PG_NUM\)' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(CACHE_POOL_NO_HIT_SET\)' | egrep -v '\(CACHE_POOL_NEAR_FULL\)' | egrep -v '\(FS_WITH_FAILED_MDS\)' | egrep -v '\(FS_DEGRADED\)' | egrep -v '\(POOL_BACKFILLFULL\)' | egrep -v '\(POOL_FULL\)' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(POOL_NEARFULL\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(AUTH_BAD_CAPS\)' | egrep -v '\(FS_INLINE_DATA_DEPRECATED\)' | egrep -v '\(MON_DOWN\)' | egrep -v '\(SLOW_OPS\)' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-05-31T05:16:23.813 INFO:teuthology.orchestra.run.smithi148.stdout:2023-05-31T05:02:56.604176+0000 mon.a (mon.0) 2048 : cluster [WRN] Health check failed: 1 pools have too many placement groups (POOL_TOO_MANY_PGS)
2023-05-31T05:16:23.813 WARNING:tasks.ceph:Found errors (ERR|WRN|SEC) in cluster log
2023-05-31T05:16:23.814 DEBUG:teuthology.orchestra.run.smithi148:> sudo egrep '\[SEC\]' /var/log/ceph/ceph.log | egrep -v 'but it is still running' | egrep -v 'had wrong client addr' | egrep -v 'had wrong cluster addr' | egrep -v 'must scrub before tier agent can activate' | egrep -v 'failsafe engaged, dropping updates' | egrep -v 'failsafe disengaged, no longer dropping updates' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(SMALLER_PG_NUM\)' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(CACHE_POOL_NO_HIT_SET\)' | egrep -v '\(CACHE_POOL_NEAR_FULL\)' | egrep -v '\(FS_WITH_FAILED_MDS\)' | egrep -v '\(FS_DEGRADED\)' | egrep -v '\(POOL_BACKFILLFULL\)' | egrep -v '\(POOL_FULL\)' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(POOL_NEARFULL\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(AUTH_BAD_CAPS\)' | egrep -v '\(FS_INLINE_DATA_DEPRECATED\)' | egrep -v '\(MON_DOWN\)' | egrep -v '\(SLOW_OPS\)' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-05-31T05:16:23.886 DEBUG:teuthology.orchestra.run.smithi148:> sudo egrep '\[ERR\]' /var/log/ceph/ceph.log | egrep -v 'but it is still running' | egrep -v 'had wrong client addr' | egrep -v 'had wrong cluster addr' | egrep -v 'must scrub before tier agent can activate' | egrep -v 'failsafe engaged, dropping updates' | egrep -v 'failsafe disengaged, no longer dropping updates' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(SMALLER_PG_NUM\)' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(CACHE_POOL_NO_HIT_SET\)' | egrep -v '\(CACHE_POOL_NEAR_FULL\)' | egrep -v '\(FS_WITH_FAILED_MDS\)' | egrep -v '\(FS_DEGRADED\)' | egrep -v '\(POOL_BACKFILLFULL\)' | egrep -v '\(POOL_FULL\)' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(POOL_NEARFULL\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(AUTH_BAD_CAPS\)' | egrep -v '\(FS_INLINE_DATA_DEPRECATED\)' | egrep -v '\(MON_DOWN\)' | egrep -v '\(SLOW_OPS\)' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-05-31T05:16:23.956 DEBUG:teuthology.orchestra.run.smithi148:> sudo egrep '\[WRN\]' /var/log/ceph/ceph.log | egrep -v 'but it is still running' | egrep -v 'had wrong client addr' | egrep -v 'had wrong cluster addr' | egrep -v 'must scrub before tier agent can activate' | egrep -v 'failsafe engaged, dropping updates' | egrep -v 'failsafe disengaged, no longer dropping updates' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(SMALLER_PG_NUM\)' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(CACHE_POOL_NO_HIT_SET\)' | egrep -v '\(CACHE_POOL_NEAR_FULL\)' | egrep -v '\(FS_WITH_FAILED_MDS\)' | egrep -v '\(FS_DEGRADED\)' | egrep -v '\(POOL_BACKFILLFULL\)' | egrep -v '\(POOL_FULL\)' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(POOL_NEARFULL\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(AUTH_BAD_CAPS\)' | egrep -v '\(FS_INLINE_DATA_DEPRECATED\)' | egrep -v '\(MON_DOWN\)' | egrep -v '\(SLOW_OPS\)' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-05-31T05:16:24.036 INFO:teuthology.orchestra.run.smithi148.stdout:2023-05-31T05:02:56.604176+0000 mon.a (mon.0) 2048 : cluster [WRN] Health check failed: 1 pools have too many placement groups (POOL_TOO_MANY_PGS)

From the mon log:

2023-05-31T05:02:57.616+0000 7f63836ea700 20 mon.a@0(leader).mgrstat health checks:
{
    "POOL_TOO_MANY_PGS": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "1 pools have too many placement groups",
            "count": 1
        },
        "detail": [
            {
                "message": "Pool modewarn has 32 placement groups, should have 32" 
            }
        ]
    }
}

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Custom queries

Bug #61570

pg_autoscaler warns that a pool has too many pgs when it has the exact right amount

Updated by Laura Flores 11 months ago

Updated by Kamoltat (Junior) Sirivadhna 11 months ago

Updated by Radoslaw Zarzynski 10 months ago

Updated by Aishwarya Mathuria 9 months ago

Updated by Laura Flores 9 months ago

Updated by Jan-Philipp Litza 8 months ago

Updated by Dan van der Ster 8 months ago

Updated by Kamoltat (Junior) Sirivadhna 7 months ago

Updated by Backport Bot 7 months ago

Updated by Backport Bot 7 months ago

Updated by Backport Bot 7 months ago

Updated by Backport Bot 7 months ago

Updated by Yuri Weinstein 6 months ago

Updated by Konstantin Shalygin about 1 month ago