Documentation #9867: PGs per OSD documentation needs clarification - Ceph - Ceph

Actions

Copy link

Documentation #9867

closed

PGs per OSD documentation needs clarification

Added by Michael Kidd over 9 years ago. Updated over 4 years ago.

Status:

Closed

Priority:

Normal

Assignee:

John Wilkins

Category:

Target version:

% Done:

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

Documentation in question:
http://ceph.com/docs/master/rados/operations/placement-groups/
http://ceph.com/docs/master/rados/operations/placement-groups/#choosing-the-number-of-placement-groups

Currently, PG documentaion concentrates on 'Per Pool' PGs per OSD counts, but neglects to discuss:

Desired range for total PGs per OSD (including all Pools and Replicas)
Impact of empty / non-active Pool's PGs on data distribution and/or Memory/CPU overhead

Desired range for total PGs per OSD (including all Pools and Replicas):

My current understanding is that the total PGs per OSD (including all replicas of all pools) should be in the target range of 100 to 200.
Thus:

(Pool1_pg_num * Pool1_size) + (Pool2_pg_num * Pool2_size) + ...
---------------------------------------------------------------  =~ 100 to 200
                   # of OSDs

This is to help with ensuring the OSD process' memory and CPU utilization remain in acceptable levels during recovery operations.
I've also been advised that 500 to 700 total PGs per OSD is generally still OK, but 1000+ total PGs per OSD is considered bad.
In the last paragraph of section:
http://ceph.com/docs/master/rados/operations/placement-groups/#choosing-the-number-of-placement-groups
50k PGs per OSD is used as an example which would use more resources.. but in reality, I've had experience on a cluster with 9k PGs per OSD that was not able to start and begin stable operations. This example gives an artificially high sense of acceptable norms for PG to OSD count.

Impact of empty / non-active Pool's PGs on data distribution and/or Memory/CPU overhead

Empty and / or non-active Pools should not be considered helpful toward the overall goal of even data distribution.
Therefore, clusters with only a few active/data containing pools and a number of non-active and / or empty pools may have poor data distribution and thus, 'hot' disks with regard to space utilization.
They do however still cause CPU and memory overhead as noted above.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Documentation #9867

PGs per OSD documentation needs clarification

Updated by Tyler Brekke over 9 years ago

Updated by Michael Kidd over 9 years ago

Updated by Zac Dover over 4 years ago