Project

General

Profile

Actions

Support #36614

closed

Cluster uses substantially more space after rebalance (erasure codes)

Added by Vitaliy Filippov over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

Hi

After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool (which was way too low, only 100 for 9 osds) the cluster started to eats additional disk space.

First I thought that was caused by the moved PGs using additional space during unfinished backfills. I pinned most of new PGs to old OSDs via `pg-upmap` and indeed it freed some space in the cluster.

Then I reduced osd_max_backfills to 1 and started to remove upmap pins in small portions which allowed Ceph to finish backfills for these PGs.

HOWEVER, used capacity still grows! It drops after moving each PG, but still grows overall.

It has grown +1.3TB yesterday. In the same period of time clients have written only ~200 new objects (~800 MB, there are RBD images only).

Why, what's using such big amount of additional space?

// Additional question is why ceph df / rados df tells there is only 16 TB actual data written, but it uses 29.8 TB (now 31 TB) of raw disk space. Shouldn't it be 16 / 2*3 = 24 TB ?


Files

photo_2018-10-29_14-10-38.jpg (20.8 KB) photo_2018-10-29_14-10-38.jpg Vitaliy Filippov, 10/29/2018 11:11 AM
photo_2018-10-29_14-10-43.jpg (19 KB) photo_2018-10-29_14-10-43.jpg Vitaliy Filippov, 10/29/2018 11:11 AM
Actions

Also available in: Atom PDF