Actions
Bug #58505
openWrong calculate free space OSD and PG used bytes
Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I added a new node with OSD to the cluster. Now I'm adding several disks each. After a short balancing time , the following problems popped up:
HEALTH_WARN 1 backfillfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 114 pgs backfill_toofull; 12 pool(s) backfillfull [WRN] OSD_BACKFILLFULL: 1 backfillfull osd(s) osd.410 is backfill full [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 114 pgs backfill_toofull pg 97.8 is active+remapped+backfill_toofull, acting [257,113,212,166,34,384] pg 97.e is active+remapped+backfill_toofull, acting [162,84,289,228,379,50] pg 97.17 is active+remapped+backfill_toofull, acting [359,287,211,63,140,56] ....skip... [WRN] POOL_BACKFILLFULL: 12 pool(s) backfillfull pool 'pool1' is backfillfull pool 'pool2' is backfillfull pool 'pool3' is backfillfull ... skip ...
At the same time, the occupancy of this disk is only 37%
# ceph osd df |awk '{if(NR==1 || /^410/){print}}' ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 410 hdd 9.09569 1.00000 9.1 TiB 3.4 TiB 3.4 TiB 0 B 9.2 GiB 5.7 TiB 37.37 0.65 37 up
But the total occupied place of PGs is almost 5TB
# ceph pg ls-by-osd osd.410|awk '{if($6 ~ /[[:digit:]]+/){s+=$6}}END{printf "%'"'"'d\n", s}' 4,990,453,504,665
On older disks, the difference is even greater.
# ceph osd df |awk '{if(NR==1 || /^ 0/){print}}' ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 hdd 9.09569 1.00000 9.1 TiB 5.0 TiB 5.0 TiB 2.5 MiB 13 GiB 4.1 TiB 54.71 0.95 34 up # ceph pg ls-by-osd osd.0|awk '{if($6 ~ /[[:digit:]]+/){s+=$6}}END{printf "%'"'"'d\n", s}' 21,501,114,045,971
It seems to me that something is calculated incorrectly due to the fact that we use floors with ErasureCodes. (4+2).
If you divide the sum of bytes in the PC by 4, you will get more or less correct.
A 10TB disk cannot hold 21TB and be 50% used.
Actions