Feature #42321
Add a new mode to balance pg layout by primary osds
0%
Description
There already have upmap optimizer since Luminous version. The upmap optimizer is help for balancing PGs across OSDs, it can get a “perfect” distribution, each OSD have equal number of PGs. But it is not balanced in primary PGs.
The upmap-by-primary-osd optimizer balance primary PG and replica PG in turn. The implementation of upmap-by-primary-osd refers to upmap. It’s behavior is just like upmap does to get a balanced distribution both primary PGs and total PGs. The optimizer balance PGs distribution in the same failure domain. As PG’s primary osd handles the read/write operations, the unbalanced OSDs result in unbalanced load. The OSD have more primary PGs will be the performance bottleneck especially for reading operation.We use fio to do 4M read test in rbd pools, it have about 20%-30% bandwidth improvement vs upmap.
We have a ceph cluster which contain 3 host,4 osds per host.We create a pool with 1024 pgs to do pg balance.
ceph osd tree looks like:
The upmap optimizer to balance pg,result is blow:
The upmap-by-primary-osd optimizer to balance pg,result is blow pic,pg primary osds is not balanced between hosts, host1 has less primary pg and so osd0,osd1,osd2,osd3 has less primary pg nums.
The usage is just like upmap:
osdmaptool osdmap.file --upmap-by-primary-osd out.txt [--upmap-pool <pool>] [--upmap-max <max-count>] [--upmap-deviation <max-deviation>]
History
#1 Updated by Greg Farnum over 4 years ago
- Project changed from Ceph to RADOS
- Category deleted (
OSDMap) - Status changed from New to Fix Under Review
#2 Updated by linhuai deng 11 months ago
- File ceph_osd_df.png View added
Hi,rosinL. I have used the function of "balance pg layout by primary osds" submitted by you. In a three-node cluster, osd nodes under different servers can be well balanced, but there may be big differences between different hosts.