Project

General

Profile

Actions

Bug #58505

open

Wrong calculate free space OSD and PG used bytes

Added by Andrey Groshev over 1 year ago. Updated over 1 year ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I added a new node with OSD to the cluster. Now I'm adding several disks each. After a short balancing time , the following problems popped up:

HEALTH_WARN 1 backfillfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 114 pgs backfill_toofull; 12 pool(s) backfillfull
[WRN] OSD_BACKFILLFULL: 1 backfillfull osd(s)
    osd.410 is backfill full
[WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 114 pgs backfill_toofull
    pg 97.8 is active+remapped+backfill_toofull, acting [257,113,212,166,34,384]
    pg 97.e is active+remapped+backfill_toofull, acting [162,84,289,228,379,50]
    pg 97.17 is active+remapped+backfill_toofull, acting [359,287,211,63,140,56]

....skip...

[WRN] POOL_BACKFILLFULL: 12 pool(s) backfillfull
    pool 'pool1' is backfillfull
    pool 'pool2' is backfillfull
    pool 'pool3' is backfillfull
... skip ...

At the same time, the occupancy of this disk is only 37%

# ceph osd df |awk '{if(NR==1 || /^410/){print}}'
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP      META      AVAIL    %USE   VAR   PGS  STATUS
410    hdd  9.09569   1.00000  9.1 TiB  3.4 TiB  3.4 TiB       0 B   9.2 GiB  5.7 TiB  37.37  0.65   37      up

But the total occupied place of PGs is almost 5TB

# ceph pg ls-by-osd osd.410|awk '{if($6 ~ /[[:digit:]]+/){s+=$6}}END{printf "%'"'"'d\n", s}'
4,990,453,504,665

On older disks, the difference is even greater.

# ceph osd df |awk '{if(NR==1 || /^  0/){print}}'
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP      META      AVAIL    %USE   VAR   PGS  STATUS
  0    hdd  9.09569   1.00000  9.1 TiB  5.0 TiB  5.0 TiB   2.5 MiB    13 GiB  4.1 TiB  54.71  0.95   34      up

# ceph pg ls-by-osd osd.0|awk '{if($6 ~ /[[:digit:]]+/){s+=$6}}END{printf "%'"'"'d\n", s}'
21,501,114,045,971

It seems to me that something is calculated incorrectly due to the fact that we use floors with ErasureCodes. (4+2).
If you divide the sum of bytes in the PC by 4, you will get more or less correct.
A 10TB disk cannot hold 21TB and be 50% used.

Actions

Also available in: Atom PDF