Project

General

Profile

Actions

Bug #64186

open

Auto-scaler suggestions makes no sense

Added by Torkil Svensgaard 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A few years ago we were really strapped for space so we tweaked pg_num for some pools to ensure all pgs were as to close to the same size as possible while stile observing the power of 2 rule, in order to get the most mileage space wise. We set the auto-scaler to off for the tweaked pools to get rid of the warnings.

We now have a lot more free space so I flipped the auto-scaler to warn for all pools and set the bulk flag for the pools expected to be data pools, leading to this:

"
[WRN] POOL_TOO_FEW_PGS: 4 pools have too few placement groups
Pool rbd has 512 placement groups, should have 2048
Pool rbd_internal has 1024 placement groups, should have 2048
Pool cephfs.nvme.data has 32 placement groups, should have 4096
Pool cephfs.ssd.data has 32 placement groups, should have 1024
[WRN] POOL_TOO_MANY_PGS: 4 pools have too many placement groups
Pool libvirt has 256 placement groups, should have 32
Pool cephfs.cephfs.data has 512 placement groups, should have 32
Pool rbd_ec_data has 4096 placement groups, should have 1024
Pool cephfs.hdd.data has 2048 placement groups, should have 1024
"

That's a lot of warnings ponder

"
  1. ceph osd pool autoscale-status
    POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK
    libvirt 2567G 3.0 3031T 0.0025 1.0 256 warn False
    .mgr 807.5M 2.0 6520G 0.0002 1.0 1 warn False
    rbd_ec 9168k 3.0 6520G 0.0000 1.0 32 warn False
    nvme 31708G 2.0 209.5T 0.2955 1.0 2048 warn False
    .nfs 36864 3.0 6520G 0.0000 1.0 32 warn False
    cephfs.cephfs.meta 24914M 3.0 6520G 0.0112 4.0 32 warn False
    cephfs.cephfs.data 16384 3.0 6520G 0.0000 1.0 512 warn False
    rbd.ssd.data 798.1G 2.25 6520G 0.2754 1.0 64 warn False
    rbd_ec_data 609.2T 1.5 3031T 0.3014 1.0 4096 warn True
    rbd 68170G 3.0 3031T 0.0659 1.0 512 warn True
    rbd_internal 69553G 3.0 3031T 0.0672 1.0 1024 warn True
    cephfs.nvme.data 0 2.0 209.5T 0.0000 1.0 32 warn True
    cephfs.ssd.data 68609M 2.0 6520G 0.0206 1.0 32 warn True
    cephfs.hdd.data 111.0T 2.25 3031T 0.0824 1.0 2048 warn True
    "
"
  1. ceph df
    --- RAW STORAGE ---
    CLASS SIZE AVAIL USED RAW USED %RAW USED
    hdd 3.0 PiB 1.3 PiB 1.6 PiB 1.6 PiB 54.69
    nvme 210 TiB 146 TiB 63 TiB 63 TiB 30.21
    ssd 6.4 TiB 4.0 TiB 2.4 TiB 2.4 TiB 37.69
    TOTAL 3.2 PiB 1.5 PiB 1.7 PiB 1.7 PiB 53.07

--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
rbd 4 512 80 TiB 21.35M 200 TiB 19.31 278 TiB
libvirt 5 256 3.0 TiB 810.89k 7.5 TiB 0.89 278 TiB
rbd_internal 6 1024 86 TiB 28.22M 204 TiB 19.62 278 TiB
.mgr 8 1 4.3 GiB 1.06k 1.6 GiB 0.07 1.0 TiB
rbd_ec 10 32 55 MiB 25 27 MiB 0 708 GiB
rbd_ec_data 11 4096 683 TiB 180.52M 914 TiB 52.26 556 TiB
nvme 23 2048 46 TiB 25.18M 62 TiB 31.62 67 TiB
.nfs 25 32 4.6 KiB 10 108 KiB 0 708 GiB
cephfs.cephfs.meta 31 32 25 GiB 1.66M 73 GiB 3.32 708 GiB
cephfs.cephfs.data 32 679 489 B 40.41M 48 KiB 0 708 GiB
cephfs.nvme.data 34 32 0 B 0 0 B 0 67 TiB
cephfs.ssd.data 35 32 77 GiB 425.03k 134 GiB 5.94 1.0 TiB
cephfs.hdd.data 37 2048 121 TiB 68.42M 250 TiB 23.03 371 TiB
rbd.ssd.data 38 64 934 GiB 239.94k 1.8 TiB 45.82 944 GiB
"

The most weird one:

Pool rbd_ec_data stores 683TB in 4096 pgs > warn should be 1024
Pool rbd_internal stores 86TB in 1024 pgs
> warn should be 2048

That makes no sense to me based on the amount of data stored. Is this a bug?

No data to display

Actions

Also available in: Atom PDF