Bug #58692: Consider setting "bulk" autoscale pool flag when automatically creating a data pool for RGW - rgw - Ceph

Actions

Copy link

Bug #58692

closed

Consider setting "bulk" autoscale pool flag when automatically creating a data pool for RGW

Added by Voja Molani about 1 year ago. Updated 9 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

Yehuda Sadeh

Target version:

% Done:

Source:

Tags:

low-hanging-fruit backport_processed

Backport:

reef

Regression:

Severity:

Reviewed:

Affected Versions:

Ceph - v17.2.5

ceph-qa-suite:

Pull request ID:

51497

Crash signature (v1):

Crash signature (v2):

Description

When a new Ceph cluster has been deployed and RGW is used on it for the first time it seems pools necessary for RGW are automatically created.
The data pool seems to be default.rgw.buckets.data. But this pool is created without setting the "bulk" flag for autoscaler so it will automatically be set to very few PG, only 32:

POOL                         SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr                        1472k                3.0        267.2T  0.0000                                  1.0       1              on         False
.rgw.root                   1327                 3.0        267.2T  0.0000                                  1.0      32              on         False
default.rgw.log             2687k                3.0        267.2T  0.0000                                  1.0      32              on         False
default.rgw.control            0                 3.0        267.2T  0.0000                                  1.0      32              on         False
default.rgw.meta            1331                 3.0        267.2T  0.0000                                  4.0      32              on         False
default.rgw.buckets.index      0                 3.0        267.2T  0.0000                                  4.0      32              on         False
default.rgw.buckets.data   32825M                3.0        267.2T  0.0004                                  1.0      32              on         False

If autoscaler is enabled then the "bulk" flag could be set automatically so that the main data pool gets the PGs it needs, in my opinion this would be a safe assumption to make. After setting the "bulk" flag the PGs are correctly increased to 1024:

default.rgw.buckets.data   33026M                3.0        267.2T  0.0004                                  1.0      32        1024  on         True

Documentation at https://docs.ceph.com/en/latest/radosgw/pools/ mentions that PGs need to be tuned for the pool but by automatically setting the "bulk" flag the automatically created pools would be "mostly" usable without further tuning.

If the automatic setting of "bulk" flag is not possible then perhaps the documentation could mention the command ceph osd pool set default.rgw.buckets.data bulk true to set the "bulk" flag if autoscaler is used.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Casey Bodley about 1 year ago

thanks Voja,

i think it's worth raising this for discussion on the dev list. rgw needs the ability to reliably create pools that don't exist. if the new pool requires too many PGs, the cluster may refuse to create them, and this has been very confusing to users historically:

https://tracker.ceph.com/issues/22351
https://tracker.ceph.com/issues/23480

in past discussions with the rados team it was suggested to use very small pg counts for this pool creation, and wait for the autoscaler to fix them up in the background. ideally, the orchestrator would create all of these pools during deployment instead of at rgw runtime, where the orchestrator has much more information about available/target PG sizing

Actions

Copy link

Updated by Casey Bodley about 1 year ago

Related to Bug #22351: Couldn't init storage provider (RADOS) added

Actions

Copy link

Updated by Voja Molani about 1 year ago

I see. If I understand the links and the concern correctly, the issue may be that trying to create a pool with high PG count could fail and the using the minimum PG count tries to avoid it.

I do not know how things work internally but I am not proposing increasing the initial PG count of the created pool. I am proposing to create the pool with the same PG count as is done now but enable the autoscaler new "bulk" flag for the pool and then later let autoscaler increase the PG count if it can. Perhaps this would have the same end result - I would not know. On my system the pool was created automatically with 32 PG and after setting the "bulk" flag it increased to 1024. I would expect the autoscaler to not try to increase PGs from the initial amount to beyond what is possible.

I can see how especially changing PG counts (as would be done by autoscaler later) could be controversial since it might mean juggling some data around -- granted, the newly created pool should have little data at this point.

As an alternative I propose to mention setting the "bulk" flag in the documentation for those users that use autoscaler. Currently documentation simply says that "PGs may need to be tuned" in https://docs.ceph.com/en/latest/radosgw/pools/#tuning - with no reference to the now enabled by default autoscaler. Since use of autoscaler seems to be encouraged then it might make sense, especially to new users, to mention the "bulk" flag and/or link the users to autoscaler documentation.
If the documentation addition sounds reasonable I could take a stab at a PR for it.

Actions

Copy link

Updated by Josh Durgin about 1 year ago

The "bulk" flag is intended for exactly this case - to differentiate between e.g. the rgw data pool and metadata pools that are not expected to need the same level of parallelism. The autoscaler operates within the pg/OSD budget, so it can't hit the issues in the past when pools were created with more pgs. RGW could create the pool with 1 pg and the bulk flag, and the autoscaler would set it to an appropriate value for the cluster size. The orchestrator should use the bulk flag as well, then it doesn't have to worry about exact pg counts.

Actions

Copy link