Project

General

Profile

Feature #42321

Add a new mode to balance pg layout by primary osds

Added by Rixin Luo over 4 years ago. Updated 11 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic,nautilus
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

There already have upmap optimizer since Luminous version. The upmap optimizer is help for balancing PGs across OSDs, it can get a “perfect” distribution, each OSD have equal number of PGs. But it is not balanced in primary PGs.
The upmap-by-primary-osd optimizer balance primary PG and replica PG in turn. The implementation of upmap-by-primary-osd refers to upmap. It’s behavior is just like upmap does to get a balanced distribution both primary PGs and total PGs. The optimizer balance PGs distribution in the same failure domain. As PG’s primary osd handles the read/write operations, the unbalanced OSDs result in unbalanced load. The OSD have more primary PGs will be the performance bottleneck especially for reading operation.We use fio to do 4M read test in rbd pools, it have about 20%-30% bandwidth improvement vs upmap.
We have a ceph cluster which contain 3 host,4 osds per host.We create a pool with 1024 pgs to do pg balance.
ceph osd tree looks like:

The upmap optimizer to balance pg,result is blow:

The upmap-by-primary-osd optimizer to balance pg,result is blow pic,pg primary osds is not balanced between hosts, host1 has less primary pg and so osd0,osd1,osd2,osd3 has less primary pg nums.

The usage is just like upmap:
osdmaptool osdmap.file --upmap-by-primary-osd out.txt [--upmap-pool <pool>] [--upmap-max <max-count>] [--upmap-deviation <max-deviation>]

ceph_osd_tree.png View (18 KB) Rixin Luo, 10/15/2019 08:21 AM

pg_balance_use_upmap_by_primary_osd.png View (28 KB) Rixin Luo, 10/15/2019 08:25 AM

pg_balance_use_upmap.png View (28.6 KB) Rixin Luo, 10/15/2019 08:25 AM

ceph_osd_df.png View (49.4 KB) linhuai deng, 04/28/2023 10:07 AM

History

#1 Updated by Greg Farnum over 4 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSDMap)
  • Status changed from New to Fix Under Review

#2 Updated by linhuai deng 11 months ago


Hi,rosinL. I have used the function of "balance pg layout by primary osds" submitted by you. In a three-node cluster, osd nodes under different servers can be well balanced, but there may be big differences between different hosts.

Also available in: Atom PDF