Project

General

Profile

Actions

Bug #51998

open

PG autoscaler is wrong when pool is EC with technique=reed_sol_r6_op

Added by Benjamin Mare over 2 years ago. Updated over 2 years ago.


Description

Dear maintainer,

The PG autoscaler is wrong when trying to calculate the RATE for a pool in Erasure Coding using technique=reed_sol_r6_op. It gives a rate of 0.0, and because of this the RATIO is 0, and the final NEW PG_NUM is the minimum.

ceph osd pool autoscale-status
POOL                     SIZE  TARGET SIZE               RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  
MY_POOL                104.5T                             0.0         2348T  0.0000                                  1.0     512          32  off

When trying to calculate the RATE, you're trying to get the "m" and the "k" of the reasure coding profile. But when using "technique=reed_sol_r6_op" the "m" isn't visible and is implied.

$ ceph osd erasure-code-profile get huitetdeux
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
k=8
plugin=jerasure
technique=reed_sol_r6_op
w=8

The "k" is clearly visible, but not the "m". I think the C code in "src/osd/OSDMap.cc" is facing this issue. In the Octopus Branch, in the function "OSDMap::pool_raw_used_rate" (line 6158) I think the code ecp.find("m") return ecp.end(), and the function return "0.0" because of this.

And, knowing RATE is 0, RATIO = SIZE * RATE / RAW = 0.

There is an extra step when using an Erasure Code Profile with "technique=reed_sol_r6_op", because it's now impossible to disable the autoscaler module. You need to to an extra "ceph osd pool set $YOUR_POOL pg_autoscale_mode off".

I think return of erasure code profile with technique=reed_sol_r6_op should be improved, or I should be able to disable the autoscaler module in Octopus like it was possible in nautilus.

Thanks

Actions #1

Updated by Neha Ojha over 2 years ago

  • Project changed from Ceph to RADOS

I think we should improve the code and seems like you have already figured out the problem. The reason you cannot disable it in octopus is because it is an always_on module. We have an RFE to allow disabling the module globally https://tracker.ceph.com/issues/51213.

Actions

Also available in: Atom PDF