Bug #58821: pg_autoscaler module is not working since Pacific version upgrade from v16.2.4 to v16.2.9 - Ceph - Ceph

Actions

Copy link

Bug #58821

open

pg_autoscaler module is not working since Pacific version upgrade from v16.2.4 to v16.2.9

Added by Prayank Saxena about 1 year ago. Updated 10 months ago.

Status:

New

Priority:

Normal

Assignee:

Category:

ceph cli

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

v16.2.9

ceph-qa-suite:

upgrade/pacific-x

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hello Team,

We have upgraded our clusters from pacific v16.2.4 to v16.2.9 few months back. Before upgrade i was able to get the output for "ceph osd pool autoscale-status".
Since upgrading the cluster to v16.2.9, no output is shown while running same command. Currently we have 16M+ objects in pool "cephfs_data" which only has 32 pg's. pg_autoscaler should kick off pg creation but still i am not seeing it happening.

While "checking ceph progress" i can see it actually reduced the pg number for "cephfs_data" pool from 128 to 32.
[Complete]: Global Recovery Event (3M)
[============================]
[Complete]: PG autoscaler decreasing pool 2 PGs from 128 to 32 (3M)
[============================]
[Complete]: Global Recovery Event (3M)
[============================]
[Complete]: Global Recovery Event (3M)
[============================]
[Complete]: PG autoscaler decreasing pool 3 PGs from 128 to 32 (3M)

I am attaching the details of cluster running in v16.2.9 and comparison with other cluster in v16.2.4.

Below is the error that i am getting on mgr logs

2023-02-22T05:12:58.295+0000 7f3f45196700 0 [pg_autoscaler INFO root] _maybe_adjust
2023-02-22T05:12:58.297+0000 7f3f45196700 0 [pg_autoscaler ERROR root] pool 2 has overlapping roots: {-20, -1}
2023-02-22T05:12:58.298+0000 7f3f45196700 0 [pg_autoscaler ERROR root] pool 3 has overlapping roots: {-20, -1}
2023-02-22T05:12:58.298+0000 7f3f45196700 0 [pg_autoscaler WARNING root] pool 1 contains an overlapping root -1... skipping scaling
2023-02-22T05:12:58.298+0000 7f3f45196700 0 [pg_autoscaler WARNING root] pool 2 contains an overlapping root -20... skipping scaling
2023-02-22T05:12:58.298+0000 7f3f45196700 0 [pg_autoscaler WARNING root] pool 3 contains an overlapping root -20... skipping scaling

Please let me know if i am missing something at the conf level or is their any existing issue with pacific v16.2.9.

Regards
Prayank

Files

PG_autoscaler_issue_pacific_v16.2.9 (67.6 KB) PG_autoscaler_issue_pacific_v16.2.9

Comparison between two ceph clusters (v16.2.9 & v16.2.4)

Prayank Saxena, 02/22/2023 06:12 AM

Actions

Copy link

Updated by Prayank Saxena about 1 year ago

I see these tickets already opened for the same https://tracker.ceph.com/issues/55611
But can i get a solution on how to resolve pg_autoscaler so that cluster can automatically scale the pg's for each pool?

Actions

Copy link

Updated by Prayank Saxena about 1 year ago

What will happen if i change the crush rule of pool 1 from replicated rule(default) to customised crush rule.
Will that trigger pg_autoscaler module to create pg's for "cephfs_data" pool?

Actions

Copy link