Feature #41417
closedrgw: store small object's data part into xattr to avoid disk space wasting
0%
Description
In the following test, I store 100000 1K/2K/4K/8K/16K/32K/64K object into newly created ceph cluster.
And record their space usage info. As we can see, a lot of disk space is wasted. In the 1K object case,
100MB small object consume at least 6.1GB disk space.
>>>>> 1K:
[root@um14 scripts]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 36 36 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 6.1 GiB 100317 0 100317 0 0 0 1740 1.3 MiB 907203 98 MiB 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 604577 591 MiB 302257 197 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 4770 4.5 MiB 3814 634 KiB 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 94 77 KiB 29 14 KiB 0 B 0 B
defaults.rgw.buckets.data 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
defaults.rgw.buckets.index 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
total_objects 100510
total_used 7.1 GiB
total_avail 551 GiB
total_space 558 GiB
[root@um14 scripts]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 0.54489 1.00000 558 GiB 7.1 GiB 6.1 GiB 21 MiB 1003 MiB 551 GiB 1.28 1.00 432 up
TOTAL 558 GiB 7.1 GiB 6.1 GiB 21 MiB 1003 MiB 551 GiB 1.28
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
...
>>>>> 4K:
[root@um14 scripts]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 384 KiB 6 0 6 0 0 0 30 30 KiB 6 6 KiB 0 B 0 B
default.rgw.buckets.data 6.1 GiB 100001 0 100001 0 0 0 2 1 KiB 900012 391 MiB 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 600049 587 MiB 300006 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 207 0 207 0 0 0 2388 2.1 MiB 1530 2 KiB 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 44 38 KiB 26 13 KiB 0 B 0 B
total_objects 100228
total_used 7.1 GiB
total_avail 551 GiB
total_space 558 GiB
>>>>> 8K:
[root@um14 scripts]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 6.1 GiB 100000 0 100000 0 0 0 0 0 B 900000 781 MiB 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 600030 586 MiB 300001 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 207 0 207 0 0 0 2369 2.1 MiB 1558 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 46 40 KiB 28 14 KiB 0 B 0 B
total_objects 100225
total_used 7.1 GiB
total_avail 551 GiB
total_space 558 GiB
>>>>>> 16K:
+ rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 6.1 GiB 99803 0 99803 0 0 0 0 0 B 898227 1.5 GiB 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 598853 585 MiB 299411 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 1383 1.2 MiB 922 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 46 40 KiB 28 14 KiB 0 B 0 B
total_objects 99996
total_used 7.1 GiB
total_avail 551 GiB
total_space 558 GiB
>>>>> 32K:
+ rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 6.1 GiB 99700 0 99700 0 0 0 0 0 B 897300 3.0 GiB 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 598229 584 MiB 299099 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 1383 1.2 MiB 922 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 46 40 KiB 28 14 KiB 0 B 0 B
total_objects 99893
total_used 7.1 GiB
total_avail 551 GiB
total_space 558 GiB
>>>>> 64K:
+ rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 6.1 GiB 99845 0 99845 0 0 0 0 0 B 898605 6.1 GiB 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 599120 585 MiB 299536 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 2241 2.0 MiB 1494 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 87 74 KiB 40 20 KiB 0 B 0 B
total_objects 100038
total_used 7.1 GiB
total_avail 551 GiB
total_space 558 GiB
In order to resolve this problem, we can store small object into rados object's xattr part,
the db of osd will convert these small kv into continuous large blocks, thus reducing the
waste of disk space.
Updated by Honggang Yang over 4 years ago
Updated by Honggang Yang over 4 years ago
The following is the test result after setting rgw_inline_limit_bytes to 64K. Obviously rados df has a problem with xattr's statistics, but we can see from the total_used part that this patch greatly reduces the waste of disk space.
1K:
+ rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 0 B 100000 0 100000 0 0 0 0 0 B 1000000 0 B 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 600026 586 MiB 300001 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 1383 1.2 MiB 922 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 42 36 KiB 24 12 KiB 0 B 0 B
total_objects 100193
total_used 1.0 GiB
total_avail 557 GiB
total_space 558 GiB
2K:
+ rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 0 B 99945 0 99945 0 0 0 0 0 B 999450 0 B 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 599693 586 MiB 299834 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 1383 1.2 MiB 922 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 42 36 KiB 24 12 KiB 0 B 0 B
total_objects 100138
total_used 1.0 GiB
total_avail 557 GiB
total_space 558 GiB
4K:
rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 0 B 100000 0 100000 0 0 0 0 0 B 1000000 0 B 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 600026 586 MiB 300001 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 1383 1.2 MiB 922 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 42 36 KiB 24 12 KiB 0 B 0 B
total_objects 100193
total_used 1.0 GiB
total_avail 557 GiB
total_space 558 GiB
8K:
+ rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 0 B 99813 0 99813 0 0 0 0 0 B 998130 0 B 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 598907 585 MiB 299441 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 1383 1.2 MiB 922 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 42 36 KiB 24 12 KiB 0 B 0 B
total_objects 100006
total_used 1.4 GiB
total_avail 557 GiB
total_space 558 GiB
16K:
+ rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 0 B 100000 0 100000 0 0 0 0 0 B 1000000 0 B 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 600030 586 MiB 300001 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 1383 1.2 MiB 922 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 44 38 KiB 26 13 KiB 0 B 0 B
total_objects 100193
total_used 2.3 GiB
total_avail 556 GiB
total_space 558 GiB
32K:
+ rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 0 B 99873 0 99873 0 0 0 5 66 KiB 998730 0 B 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 599274 586 MiB 299620 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 1383 1.2 MiB 922 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 46 40 KiB 28 14 KiB 0 B 0 B
total_objects 100066
total_used 3.6 GiB
total_avail 554 GiB
total_space 558 GiB
64K:
+ rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 13 13 KiB 4 4 KiB 0 B 0 B
default.rgw.buckets.data 0 B 100000 0 100000 0 0 0 0 0 B 1000000 0 B 0 B 0 B
default.rgw.buckets.index 0 B 1 0 1 0 0 0 600032 586 MiB 300001 195 MiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 1383 1.2 MiB 922 0 B 0 B 0 B
default.rgw.meta 256 KiB 5 0 5 0 0 0 46 40 KiB 28 14 KiB 0 B 0 B
total_objects 100193
total_used 6.8 GiB
total_avail 551 GiB
total_space 558 GiB
Updated by Honggang Yang over 4 years ago
The disk space statistics of a newly created cluster when no user data is written are as follows:
[root@um14 ceph]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
.rgw.root 256 KiB 4 0 4 0 0 0 0 0 B 4 4 KiB 0 B 0 B
default.rgw.control 0 B 8 0 8 0 0 0 0 0 B 0 0 B 0 B 0 B
default.rgw.log 0 B 175 0 175 0 0 0 1575 1.4 MiB 1050 0 B 0 B 0 B
default.rgw.meta 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
defaults.rgw.buckets.data 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
defaults.rgw.buckets.index 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
total_objects 187
total_used 1.0 GiB
total_avail 557 GiB
total_space 558 GiB
The total_used is 1GB. This part may be used for db.slow.
The label information for my bluestore is as follows:
# ceph-bluestore-tool --command show-label --path /var/lib/ceph/osd/ceph-0/
inferring bluefs devices from bluestore path
{
"/var/lib/ceph/osd/ceph-0/block": {
"osd_uuid": "16d463da-7a05-4db2-9e14-327b0e6ff81c",
"size": 599147937792,
"btime": "2019-08-24T17:13:20.873519+0800",
"description": "main",
"bluefs": "1",
"ceph_fsid": "6a9f4f54-32fd-4321-ba99-9ce84fa5a3af",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "AQCt/2Bdc7q0LBAAvQwDC/CNw5+H9pFZqPzfYA==",
"ready": "ready",
"require_osd_release": "15",
"whoami": "0"
}
}
Updated by Patrick Donnelly over 4 years ago
- Project changed from Ceph to rgw
- Subject changed from store small object's data part into xattr to avoid disk space wasting to rgw: store small object's data part into xattr to avoid disk space wasting
- Status changed from New to Fix Under Review
- Start date deleted (
08/24/2019) - Pull request ID set to 29863
Updated by Honggang Yang over 4 years ago
Try to fix this problem in the bluestore layer:
https://github.com/ceph/ceph/pull/30056
Updated by Igor Fedotov almost 4 years ago
- Related to Bug #41577: Erasure-Coded storage in bluestore has larger disk usage than expected added
Updated by Igor Fedotov almost 4 years ago
- Related to Bug #44213: Erasure coded pool might need much more disk space than expected added
Updated by Igor Fedotov almost 4 years ago
- Status changed from Fix Under Review to Rejected
Closing in favor of the fix for https://tracker.ceph.com/issues/44213