Project

General

Profile

Bug #41577

Erasure-Coded storage in bluestore has larger disk usage than expected

Added by Yan Zhao about 2 years ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The test is done in ceph 14.2.1

We've tested Erasure Coded storage with the same amount of data, which is 800 GiB.

But the actual used storage space reported by "ceph df" is much more than expected and varied a lot based on the different block size.

Small block sized objects yields more waste on the space. But is the "ceph df" shows the actual data? EC 16+4 is supposed to use less space than EC 2+1?

Total data written to EC pool: 800 GiB

128K block size        1M block size    expected
EC 2+1: 1.2 TiB 1.2 TiB 1.2 TiB

EC 4+2: 1.2 TiB 1.2 TiB 1.2 TiB

EC 6+2: 3.2 TiB 1.3 TiB 1.1 TiB

EC 16+4: 4.1 TiB 1.5 TiB 1.0 TiB

Note: With "4K block size", EC 6+2 shows "15 TiB" used space.

Here's some details on 128KB block size on EC 6+2.

  1. ceph df
    RAW STORAGE:
    CLASS SIZE AVAIL USED RAW USED %RAW USED
    hdd 17 TiB 14 TiB 3.2 TiB 3.2 TiB 18.41
    TOTAL 17 TiB 14 TiB 3.2 TiB 3.2 TiB 18.41

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
replicapool 1 0 B 0 0 B 0 4.3 TiB
replicated-metadata-pool-62 2 1.7 KiB 202 19 MiB 0 4.3 TiB
ec-data-pool-62 3 803 GiB 205.40k 3.2 TiB 19.69 9.7 TiB

  1. rados df
    POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
    ec-data-pool-62 3.2 TiB 205401 0 1643208 0 0 0 22858 97 MiB 6830188 810 GiB 0 B 0 B
    replicapool 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
    replicated-metadata-pool-62 19 MiB 202 0 606 0 0 0 1795 1.3 MiB 311 306 KiB 0 B 0 B

total_objects 205603
total_used 3.2 TiB
total_avail 14 TiB
total_space 17 TiB

  1. ceph osd df tree
    ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
    -1 17.43652 - 17 TiB 3.2 TiB 3.2 TiB 2.4 MiB 45 GiB 14 TiB 18.41 1.00 - root default
    -5 2.17957 - 2.2 TiB 410 GiB 405 GiB 448 KiB 5.3 GiB 1.8 TiB 18.39 1.00 - host clcmapp-e-host01
    1 hdd 0.54489 1.00000 558 GiB 94 GiB 93 GiB 96 KiB 1.2 GiB 464 GiB 16.90 0.92 137 up osd.1
    9 hdd 0.54489 1.00000 558 GiB 102 GiB 101 GiB 100 KiB 1.3 GiB 456 GiB 18.29 0.99 149 up osd.9
    17 hdd 0.54489 1.00000 558 GiB 103 GiB 101 GiB 120 KiB 1.3 GiB 455 GiB 18.38 1.00 144 up osd.17
    25 hdd 0.54489 1.00000 558 GiB 112 GiB 110 GiB 132 KiB 1.4 GiB 446 GiB 19.99 1.09 156 up osd.25
    -7 2.17957 - 2.2 TiB 411 GiB 405 GiB 324 KiB 5.7 GiB 1.8 TiB 18.41 1.00 - host clcmapp-e-host02
    0 hdd 0.54489 1.00000 558 GiB 102 GiB 100 GiB 24 KiB 1.8 GiB 456 GiB 18.23 0.99 147 up osd.0
    8 hdd 0.54489 1.00000 558 GiB 103 GiB 101 GiB 120 KiB 1.3 GiB 455 GiB 18.43 1.00 159 up osd.8
    16 hdd 0.54489 1.00000 558 GiB 92 GiB 91 GiB 76 KiB 1.2 GiB 466 GiB 16.49 0.90 143 up osd.16
    24 hdd 0.54489 1.00000 558 GiB 114 GiB 113 GiB 104 KiB 1.4 GiB 444 GiB 20.49 1.11 161 up osd.24
    -9 2.17957 - 2.2 TiB 411 GiB 405 GiB 228 KiB 5.5 GiB 1.8 TiB 18.40 1.00 - host clcmapp-e-host03
    2 hdd 0.54489 1.00000 558 GiB 114 GiB 113 GiB 84 KiB 1.3 GiB 444 GiB 20.45 1.11 162 up osd.2
    10 hdd 0.54489 1.00000 558 GiB 107 GiB 106 GiB 48 KiB 1.5 GiB 451 GiB 19.19 1.04 153 up osd.10
    18 hdd 0.54489 1.00000 558 GiB 98 GiB 97 GiB 48 KiB 1.3 GiB 460 GiB 17.54 0.95 135 up osd.18
    26 hdd 0.54489 1.00000 558 GiB 92 GiB 90 GiB 48 KiB 1.4 GiB 466 GiB 16.42 0.89 128 up osd.26
    -13 2.17957 - 2.2 TiB 411 GiB 405 GiB 328 KiB 5.4 GiB 1.8 TiB 18.39 1.00 - host clcmapp-w-host01
    3 hdd 0.54489 1.00000 558 GiB 97 GiB 96 GiB 120 KiB 1.3 GiB 461 GiB 17.38 0.94 144 up osd.3
    11 hdd 0.54489 1.00000 558 GiB 105 GiB 104 GiB 48 KiB 1.5 GiB 453 GiB 18.90 1.03 149 up osd.11
    19 hdd 0.54489 1.00000 558 GiB 117 GiB 116 GiB 40 KiB 1.4 GiB 441 GiB 21.04 1.14 156 up osd.19
    27 hdd 0.54489 1.00000 558 GiB 91 GiB 90 GiB 120 KiB 1.2 GiB 467 GiB 16.26 0.88 123 up osd.27
    -17 2.17957 - 2.2 TiB 411 GiB 405 GiB 240 KiB 5.9 GiB 1.8 TiB 18.42 1.00 - host clcmapp-w-host02
    7 hdd 0.54489 1.00000 558 GiB 103 GiB 102 GiB 48 KiB 1.5 GiB 455 GiB 18.51 1.01 152 up osd.7
    15 hdd 0.54489 1.00000 558 GiB 105 GiB 104 GiB 48 KiB 1.3 GiB 453 GiB 18.91 1.03 152 up osd.15
    23 hdd 0.54489 1.00000 558 GiB 97 GiB 96 GiB 120 KiB 1.2 GiB 461 GiB 17.34 0.94 133 up osd.23
    31 hdd 0.54489 1.00000 558 GiB 106 GiB 104 GiB 24 KiB 1.9 GiB 452 GiB 18.92 1.03 154 up osd.31
    -15 2.17957 - 2.2 TiB 410 GiB 405 GiB 352 KiB 5.3 GiB 1.8 TiB 18.39 1.00 - host clcmapp-w-host03
    4 hdd 0.54489 1.00000 558 GiB 102 GiB 101 GiB 96 KiB 1.4 GiB 456 GiB 18.35 1.00 152 up osd.4
    12 hdd 0.54489 1.00000 558 GiB 95 GiB 93 GiB 88 KiB 1.2 GiB 463 GiB 16.94 0.92 145 up osd.12
    20 hdd 0.54489 1.00000 558 GiB 100 GiB 99 GiB 72 KiB 1.3 GiB 458 GiB 17.96 0.98 136 up osd.20
    28 hdd 0.54489 1.00000 558 GiB 113 GiB 112 GiB 96 KiB 1.4 GiB 445 GiB 20.31 1.10 155 up osd.28
    -3 2.17957 - 2.2 TiB 411 GiB 405 GiB 264 KiB 5.9 GiB 1.8 TiB 18.42 1.00 - host clcmapp-w-host04
    5 hdd 0.54489 1.00000 558 GiB 99 GiB 98 GiB 48 KiB 1.5 GiB 459 GiB 17.82 0.97 141 up osd.5
    13 hdd 0.54489 1.00000 558 GiB 92 GiB 91 GiB 96 KiB 1.2 GiB 466 GiB 16.56 0.90 137 up osd.13
    21 hdd 0.54489 1.00000 558 GiB 116 GiB 114 GiB 96 KiB 1.3 GiB 442 GiB 20.73 1.13 163 up osd.21
    29 hdd 0.54489 1.00000 558 GiB 104 GiB 102 GiB 24 KiB 1.9 GiB 454 GiB 18.57 1.01 144 up osd.29
    -11 2.17957 - 2.2 TiB 411 GiB 405 GiB 232 KiB 6.2 GiB 1.8 TiB 18.43 1.00 - host clcmapp-w-host05
    6 hdd 0.54489 1.00000 558 GiB 107 GiB 105 GiB 24 KiB 2.0 GiB 451 GiB 19.17 1.04 149 up osd.6
    14 hdd 0.54489 1.00000 558 GiB 104 GiB 103 GiB 60 KiB 1.5 GiB 454 GiB 18.71 1.02 148 up osd.14
    22 hdd 0.54489 1.00000 558 GiB 109 GiB 108 GiB 52 KiB 1.4 GiB 449 GiB 19.56 1.06 162 up osd.22
    30 hdd 0.54489 1.00000 558 GiB 91 GiB 90 GiB 96 KiB 1.2 GiB 467 GiB 16.27 0.88 127 up osd.30
    TOTAL 17 TiB 3.2 TiB 3.2 TiB 2.4 MiB 45 GiB 14 TiB 18.41
    MIN/MAX VAR: 0.88/1.14 STDDEV: 1.35

Login one of the osds(osd 6), check its space used

  1. mkdir /mnt/foo
  2. ceph-objectstore-tool --op fuse --data-path /var/lib/rook/ceph-6 --mountpoint /mnt/foo --no-mon-config
  1. ls
    1.45_head 2.40_head 3.118s2_head 3.14ds4_head 3.16ds4_head 3.1abs3_head 3.1e2s0_head 3.26s0_head 3.50s7_head 3.71s1_head 3.a0s1_head 3.c3s1_head 3.f2s0_head
    1.47_head 2.44_head 3.121s3_head 3.151s2_head 3.176s0_head 3.1acs6_head 3.1e6s1_head 3.28s0_head 3.54s3_head 3.73s7_head 3.a7s4_head 3.ccs4_head 3.f3s6_head
    1.4b_head 2.4b_head 3.122s7_head 3.153s5_head 3.177s3_head 3.1as2_head 3.1e7s3_head 3.2cs1_head 3.55s0_head 3.74s3_head 3.acs7_head 3.cfs6_head 3.f4s7_head
    1.4e_head 2.56_head 3.124s2_head 3.156s6_head 3.17fs6_head 3.1b1s5_head 3.1e8s7_head 3.2fs3_head 3.58s5_head 3.82s6_head 3.as7_head 3.cs2_head 3.fas3_head
    1.54_head 2.7_head 3.128s4_head 3.15ds1_head 3.182s5_head 3.1c0s7_head 3.1ebs2_head 3.35s4_head 3.5bs7_head 3.86s2_head 3.b0s1_head 3.d8s1_head 3.fcs2_head
    1.60_head 3.0s2_head 3.129s2_head 3.15fs0_head 3.184s2_head 3.1c3s2_head 3.1eds1_head 3.36s0_head 3.5ds3_head 3.88s4_head 3.b2s2_head 3.d9s6_head meta
    1.63_head 3.103s4_head 3.12bs7_head 3.160s4_head 3.185s6_head 3.1cds6_head 3.1f5s0_head 3.3bs6_head 3.5es2_head 3.89s5_head 3.b3s1_head 3.dbs1_head type
    1.6_head 3.106s4_head 3.134s0_head 3.161s2_head 3.186s1_head 3.1d2s0_head 3.1f6s5_head 3.40s7_head 3.5fs4_head 3.8ds2_head 3.b8s3_head 3.des6_head
    2.1b_head 3.10bs7_head 3.13es4_head 3.165s1_head 3.18ds4_head 3.1d6s3_head 3.1fbs5_head 3.43s4_head 3.60s4_head 3.8es1_head 3.b9s4_head 3.e1s5_head
    2.29_head 3.10s1_head 3.13fs0_head 3.166s6_head 3.193s7_head 3.1d7s2_head 3.1s3_head 3.44s4_head 3.63s2_head 3.90s2_head 3.bbs4_head 3.e2s7_head
    2.36_head 3.114s6_head 3.143s0_head 3.169s4_head 3.19ds5_head 3.1d9s5_head 3.20s1_head 3.46s4_head 3.64s0_head 3.92s3_head 3.bcs1_head 3.e5s4_head
    2.3b_head 3.117s4_head 3.148s0_head 3.16as3_head 3.1a3s6_head 3.1e0s1_head 3.21s1_head 3.4bs7_head 3.70s7_head 3.9cs2_head 3.c2s2_head 3.f0s4_head
// The osd storage: 37.8GB
  1. du -sb /mnt/foo
    37764300481 /mnt/foo

// One strip of data is 4M/6 ~= 0.7M
[root@rook-ceph-osd-6-6c5b5fbc8f-s7p9q all]# ls l 2#3:3f3aa47b:::rbd_data.2.91f2b13076d6d.000000000000076e:head#
total 86
drwx-----
. 0 root root 0 Jan 1 1970 attr
rwx-----. 0 root root 9 Jan 1 1970 bitwise_hash
rwx-----. 1 root root 700416 Jan 1 1970 data
drwx------. 0 root root 0 Jan 1 1970 omap
rwx-----. 0 root root 0 Jan 1 1970 omap_header


Related issues

Related to bluestore - Bug #44213: Erasure coded pool might need much more disk space than expected Resolved
Related to rgw - Feature #41417: rgw: store small object's data part into xattr to avoid disk space wasting Rejected

History

#1 Updated by Yan Zhao about 2 years ago

The issue of small object size uses more space seems related to https://tracker.ceph.com/issues/41417

#2 Updated by Jason Dillaman about 1 year ago

  • Project changed from rbd to RADOS

#3 Updated by Igor Fedotov about 1 year ago

  • Related to Bug #44213: Erasure coded pool might need much more disk space than expected added

#4 Updated by Igor Fedotov about 1 year ago

  • Related to Feature #41417: rgw: store small object's data part into xattr to avoid disk space wasting added

Also available in: Atom PDF