Project

General

Profile

Actions

Bug #24512

open

Raw used space leak

Added by Thomas De Maet almost 6 years ago. Updated about 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello

I'm testing an setup of cephfs over a EC pool with 21 data + 3 coding chunks ([EC_]stripe_unit of 16k).
All OSD are bluestore on 1 HDD along with WAL&DB.

I'm experiencing unexpected usage of raw space (more than 200% instead of theoretical 24/21=114%).

I've run a lot of copy tests trying identifying the issue.

Firstly, the initial situation (see file "global_stats.txt"): I have 15T in the global, but two pools with only 7.8T and 1G respectively.

From what I dig in doc + mailing list, the space can be lost with:
- WAL+DB: expecting max 1.5GB/osd on 80 osds: 120GB (observed after deletion of all cephfs data: 91GB)
- unfilled stripes of min. "[pool_]obj_stripe", max loss possible: 240k files * 1344kB = 315GB

=> we can expect up to 435GB more, but 7207GB are observed !

What I'm expecting from EC:
- unfilled EC stripes of min. 21*[EC_]stripe_unit = 21*16kB = 336kB
=> no loss possible if [pool_]obj_stripe of 1344kB = 4*336kB , as each full pool_stripe is stored on 4 full EC_stripes
=> rem: with the base 4MB, we should have 4096รท336=12.19 => 13/12.19*24/21 = 121.9% (instead of 114.3%, or increase of ~7%)
What I'm expecting from bluestore:
- some additional DB stuff (keys, index, checksums)... But mostly negligible if large data ?
- some block alignement optims causing fragmentation ?

Secondly, I run bunch of tests with cp, rsync and dd (see copy_tests.txt).

It can be seen that when copying with packets of 1M (what cp is doing), the usage is more than 200% of the original files. That decreases down to 130% when the packet is equal to the object size and increases up to 450% when it is small (128k).

I dont know what is the origin (fragmentation?), but EC loose a large part of its purpose here: gaining space.

1) is it the expected behavior ? (appart the fact that you dont propose such design for EC) What did I miss ?

2) if it is due to bluestore fragmentation, is it possible to design a "defragmenter" in the future to get back the unused space ?

Thanks


Files

global_stats.txt (1015 Bytes) global_stats.txt Thomas De Maet, 06/13/2018 01:39 PM
copy_tests.txt (2.13 KB) copy_tests.txt Thomas De Maet, 06/13/2018 01:39 PM
osd_0_asok (26.7 KB) osd_0_asok Thomas De Maet, 06/19/2018 12:25 PM
osd_30_asok (26.7 KB) osd_30_asok Thomas De Maet, 06/19/2018 12:25 PM
osd_53_asok (26.4 KB) osd_53_asok Thomas De Maet, 06/19/2018 12:25 PM
osd_77_asok (26.6 KB) osd_77_asok Thomas De Maet, 06/19/2018 12:25 PM
osd_df (8.28 KB) osd_df Thomas De Maet, 06/19/2018 12:25 PM
fs_pool_osd.txt (46.3 KB) fs_pool_osd.txt Thomas De Maet, 06/20/2018 07:51 AM
Actions

Also available in: Atom PDF