Project

General

Profile

Actions

Bug #58821

open

pg_autoscaler module is not working since Pacific version upgrade from v16.2.4 to v16.2.9

Added by Prayank Saxena about 1 year ago. Updated 10 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
ceph cli
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/pacific-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello Team,

We have upgraded our clusters from pacific v16.2.4 to v16.2.9 few months back. Before upgrade i was able to get the output for "ceph osd pool autoscale-status".
Since upgrading the cluster to v16.2.9, no output is shown while running same command. Currently we have 16M+ objects in pool "cephfs_data" which only has 32 pg's. pg_autoscaler should kick off pg creation but still i am not seeing it happening.

While "checking ceph progress" i can see it actually reduced the pg number for "cephfs_data" pool from 128 to 32.
[Complete]: Global Recovery Event (3M)
[============================]
[Complete]: PG autoscaler decreasing pool 2 PGs from 128 to 32 (3M)
[============================]
[Complete]: Global Recovery Event (3M)
[============================]
[Complete]: Global Recovery Event (3M)
[============================]
[Complete]: PG autoscaler decreasing pool 3 PGs from 128 to 32 (3M)

I am attaching the details of cluster running in v16.2.9 and comparison with other cluster in v16.2.4.

Below is the error that i am getting on mgr logs

2023-02-22T05:12:58.295+0000 7f3f45196700 0 [pg_autoscaler INFO root] _maybe_adjust
2023-02-22T05:12:58.297+0000 7f3f45196700 0 [pg_autoscaler ERROR root] pool 2 has overlapping roots: {-20, -1}
2023-02-22T05:12:58.298+0000 7f3f45196700 0 [pg_autoscaler ERROR root] pool 3 has overlapping roots: {-20, -1}
2023-02-22T05:12:58.298+0000 7f3f45196700 0 [pg_autoscaler WARNING root] pool 1 contains an overlapping root -1... skipping scaling
2023-02-22T05:12:58.298+0000 7f3f45196700 0 [pg_autoscaler WARNING root] pool 2 contains an overlapping root -20... skipping scaling
2023-02-22T05:12:58.298+0000 7f3f45196700 0 [pg_autoscaler WARNING root] pool 3 contains an overlapping root -20... skipping scaling

Please let me know if i am missing something at the conf level or is their any existing issue with pacific v16.2.9.

Regards
Prayank


Files

PG_autoscaler_issue_pacific_v16.2.9 (67.6 KB) PG_autoscaler_issue_pacific_v16.2.9 Comparison between two ceph clusters (v16.2.9 & v16.2.4) Prayank Saxena, 02/22/2023 06:12 AM
Actions #1

Updated by Prayank Saxena about 1 year ago

I see these tickets already opened for the same https://tracker.ceph.com/issues/55611
But can i get a solution on how to resolve pg_autoscaler so that cluster can automatically scale the pg's for each pool?

Actions #2

Updated by Prayank Saxena about 1 year ago

What will happen if i change the crush rule of pool 1 from replicated rule(default) to customised crush rule.
Will that trigger pg_autoscaler module to create pg's for "cephfs_data" pool?

Actions #3

Updated by Ilya Dryomov about 1 year ago

  • Target version changed from v16.2.12 to v16.2.13
Actions #4

Updated by Ilya Dryomov 10 months ago

  • Target version deleted (v16.2.13)
Actions

Also available in: Atom PDF