Documentation #9867: PGs per OSD documentation needs clarification - Ceph - Ceph

Actions

Copy link

Documentation #9867

closed

PGs per OSD documentation needs clarification

Added by Michael Kidd over 9 years ago. Updated over 4 years ago.

Status:

Closed

Priority:

Normal

Assignee:

John Wilkins

Category:

Target version:

% Done:

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

Documentation in question:
http://ceph.com/docs/master/rados/operations/placement-groups/
http://ceph.com/docs/master/rados/operations/placement-groups/#choosing-the-number-of-placement-groups

Currently, PG documentaion concentrates on 'Per Pool' PGs per OSD counts, but neglects to discuss:

Desired range for total PGs per OSD (including all Pools and Replicas)
Impact of empty / non-active Pool's PGs on data distribution and/or Memory/CPU overhead

Desired range for total PGs per OSD (including all Pools and Replicas):

My current understanding is that the total PGs per OSD (including all replicas of all pools) should be in the target range of 100 to 200.
Thus:

(Pool1_pg_num * Pool1_size) + (Pool2_pg_num * Pool2_size) + ...
---------------------------------------------------------------  =~ 100 to 200
                   # of OSDs

This is to help with ensuring the OSD process' memory and CPU utilization remain in acceptable levels during recovery operations.
I've also been advised that 500 to 700 total PGs per OSD is generally still OK, but 1000+ total PGs per OSD is considered bad.
In the last paragraph of section:
http://ceph.com/docs/master/rados/operations/placement-groups/#choosing-the-number-of-placement-groups
50k PGs per OSD is used as an example which would use more resources.. but in reality, I've had experience on a cluster with 9k PGs per OSD that was not able to start and begin stable operations. This example gives an artificially high sense of acceptable norms for PG to OSD count.

Impact of empty / non-active Pool's PGs on data distribution and/or Memory/CPU overhead

Empty and / or non-active Pools should not be considered helpful toward the overall goal of even data distribution.
Therefore, clusters with only a few active/data containing pools and a number of non-active and / or empty pools may have poor data distribution and thus, 'hot' disks with regard to space utilization.
They do however still cause CPU and memory overhead as noted above.

Actions

Copy link

Updated by Tyler Brekke over 9 years ago

Assignee set to John Wilkins

Actions

Copy link

Updated by Michael Kidd over 9 years ago

It should also be noted that the PG per Pool distribution should be directly proportional to the overall distribution of data among the pools.

In other words... Pools which will store more data, have more PGs...

Actions

Copy link

Updated by Zac Dover over 4 years ago

Status changed from New to Closed

This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prior to Luminous, or 2) just really old, and untouched for so long that it is unlikely nowadays to represent a live documentation concern.

If you think that the closing of this bug is an error, raise another bug of a similar kind. If you think that the matter requires urgent attention, please let Zac Dover know at zac.dover@gmail.com.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries