Project

General

Profile

Documentation #9867

PGs per OSD documentation needs clarification

Added by Michael Kidd over 5 years ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Documentation in question:
http://ceph.com/docs/master/rados/operations/placement-groups/
http://ceph.com/docs/master/rados/operations/placement-groups/#choosing-the-number-of-placement-groups

Currently, PG documentaion concentrates on 'Per Pool' PGs per OSD counts, but neglects to discuss:
  • Desired range for total PGs per OSD (including all Pools and Replicas)
  • Impact of empty / non-active Pool's PGs on data distribution and/or Memory/CPU overhead
Desired range for total PGs per OSD (including all Pools and Replicas):
  • My current understanding is that the total PGs per OSD (including all replicas of all pools) should be in the target range of 100 to 200.
    Thus:
    (Pool1_pg_num * Pool1_size) + (Pool2_pg_num * Pool2_size) + ...
    ---------------------------------------------------------------  =~ 100 to 200
                       # of OSDs
    
  • This is to help with ensuring the OSD process' memory and CPU utilization remain in acceptable levels during recovery operations.
  • I've also been advised that 500 to 700 total PGs per OSD is generally still OK, but 1000+ total PGs per OSD is considered bad.
  • In the last paragraph of section:
    http://ceph.com/docs/master/rados/operations/placement-groups/#choosing-the-number-of-placement-groups
    50k PGs per OSD is used as an example which would use more resources.. but in reality, I've had experience on a cluster with 9k PGs per OSD that was not able to start and begin stable operations. This example gives an artificially high sense of acceptable norms for PG to OSD count.
Impact of empty / non-active Pool's PGs on data distribution and/or Memory/CPU overhead
  • Empty and / or non-active Pools should not be considered helpful toward the overall goal of even data distribution.
    Therefore, clusters with only a few active/data containing pools and a number of non-active and / or empty pools may have poor data distribution and thus, 'hot' disks with regard to space utilization.
  • They do however still cause CPU and memory overhead as noted above.

History

#1 Updated by Tyler Brekke over 5 years ago

  • Assignee set to John Wilkins

#2 Updated by Michael Kidd over 5 years ago

It should also be noted that the PG per Pool distribution should be directly proportional to the overall distribution of data among the pools.

In other words... Pools which will store more data, have more PGs...

#3 Updated by Zac Dover 3 months ago

  • Status changed from New to Closed

This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prior to Luminous, or 2) just really old, and untouched for so long that it is unlikely nowadays to represent a live documentation concern.

If you think that the closing of this bug is an error, raise another bug of a similar kind. If you think that the matter requires urgent attention, please let Zac Dover know at .

Also available in: Atom PDF