Support #47150
openceph df - big difference between per-class and per-pool usage
0%
Description
Considering the below example:
[root@node101 ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 82 TiB 27 TiB 55 TiB 55 TiB 66.82
nvme 22 TiB 11 TiB 11 TiB 11 TiB 49.03
TOTAL 104 TiB 38 TiB 65 TiB 65 TiB 63.07
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
images 1 27 TiB 7.31M 53 TiB 75.18 8.8 TiB
volumes_nvme 2 1.2 TiB 447.17k 2.3 TiB 56.95 893 GiB
volumes 3 525 GiB 137.05k 1.0 TiB 5.49 8.8 TiB
vms 5 4.2 TiB 2.67M 8.3 TiB 82.55 893 GiB
.rgw.root 11 4.5 KiB 5 940 KiB 0 8.8 TiB
default.rgw.control 12 0 B 8 0 B 0 8.8 TiB
default.rgw.meta 13 5.9 KiB 15 1.8 MiB 0 8.8 TiB
default.rgw.log 14 0 B 217 0 B 0 8.8 TiB
default.rgw.buckets.index 15 7.9 KiB 5 7.9 KiB 0 8.8 TiB
default.rgw.buckets.data 16 32 GiB 4.98k 64 GiB 0.35 8.8 TiB
vms and volumes_nvme pools are built on the nvme storage class.
While the RAW STORAGE reports 11TB available for nvme, the POOLS reports only 893GB MAX AVAIL.
I understood that the POOLS calculation are done via a complex function, but why is the difference so big?
Updated by Igor Fedotov over 3 years ago
MAX AVAIL in POOL sections is primarily determined by the OSD with the least amount of free space.
"ceph osd df tree" output might provide some clue...
Updated by Marius Leustean over 3 years ago
Alright, ceph osd df tree class nvme outputs the same capacity as the RAW STORAGE above (11TB).
Does it mean that I can still place 11TB amount of data in volumes_nvme pool, even if it reports 893GB as MAX AVAIL ?
Updated by Igor Fedotov over 3 years ago
Marius Leustean wrote:
Alright, ceph osd df tree class nvme outputs the same capacity as the RAW STORAGE above (11TB).
Does it mean that I can still place 11TB amount of data in volumes_nvme pool, even if it reports 893GB as MAX AVAIL ?
No. Per-class AVAIL shows overall amount of free space for all the devices of this class. But this has a little correlation with the amount of data user can put due to various overhead (repliaction, EC, allocation) and potentially uneven data distribution..
And per-pool AVAIL is determined by the free space at the fullest OSDs (which backs this pool). I.e. this is a guaranteed(!) amount of data user can store. Ceph guarantees that it has enough space to keep this amount user data even in the unlikely case when all the data goes to that OSD.
E.g. some cluster has 3 OSDs with 0.5, 1 and 1.5 GB of free space respectively.
Per-class AVAIL is 3GB then. And per-pool AVAIL is just 0.5 GB.
From "df tree" you can check if free space for your OSDs is distributed unevenly. Which I presume is the case.
I can't say for sure about the root cause of this and how to deal with it. May be reweight OSDs or something.
But I recall that ceph-users mailing list had some relevant discussion on this topic. It's better consult there...