Project

General

Profile

Actions

Support #22243

open

Luminous: EC pool using more space than it should

Added by Daniel Neilson over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Tags:
erasure-code
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

Hello,

I have an erasure-coded pool that is using more space on the OSDs than it should be. The EC profile is set to k=6 m=2 but I'm seeing the amount of data being used as more like k=2 m=1. Here is the 'ceph -s' output:

cluster:
id: b6e8901b-d94d-4b4f-b5e1-7eb05848227f
health: HEALTH_OK
services:
mon: 1 daemons, quorum dev-pve01
mgr: dev-pve01(active)
mds: pool01-1/1/1 up {0=dev-pve01=up:active}
osd: 10 osds: 10 up, 10 in
data:
pools: 2 pools, 160 pgs
objects: 1402k objects, 2650 GB
usage: 4034 GB used, 18603 GB / 22638 GB avail
pgs: 159 active+clean
1 active+clean+scrubbing+deep

As this output shows, the amount used is 52% higher than the object space stored which I'm generally expecting to be closer to 33%. I'm using this for CephFS, so one pool is for data and the other is metadata. Here is the EC profile that I'm using:

ceph osd erasure-code-profile get k6m2
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=6
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Output of pool settings:

  1. ceph osd pool get pool01 erasure_code_profile
    erasure_code_profile: k6m2

Is there anything that I can dig into or look at that would get me closer to the 33% data used instead of 50%?

Thank you.

Actions #1

Updated by Greg Farnum over 6 years ago

  • Tracker changed from Bug to Support
  • Assignee deleted (Jos Collin)

There’s a few things here to look at.
OSDs do use up space for non-object metadata and tracking purposes. It’s not a lot, but it’s not nothing.

Files need to be a certain size before the erasure coding is efficient. Each object has a mostly-constant amount of metadata overhead. I think on bluestore it’s about 1KB? Your average object size is right around 2MB, but that’s split up into 6 pieces, so 333KB. 1KB on top of that isn’t much, but if you’ve got a lot of smaller ones and a lot of bigger ones it can add up.

The CephFS metadata is stored replicated, not erasure coded. And I’m not sure if bluestore counts its size when reporting object sizes. No idea how much space that’s taking up. You could query the mds for its log size to see that number, but it’s also got all the directory objects and I don’t think an accounting for them is available yet (it’s being worked on).

Actions #2

Updated by John Spray over 6 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (129)
Actions #3

Updated by Daniel Neilson over 6 years ago

That definitely makes sense that there's additional space taken up from metadata and other information. It just seems like there's a lot of extra space being consumed for that. I can definitely see that if the object size isn't optimal for EC that this could be the result.

Just to clarify, I have crush rulesets and buckets so that the 8 disks used for CephFS (8*3TB) are in one bucket with a pool set to k=6,m=2 and 2 SSDs are in a separate bucket with a pool setting of 2*replicated for the MDS node. I probably should have included that in the initial post.

Actions

Also available in: Atom PDF