Feature #13188
openuse crushtool to simulate real pg distribution for a specific pool
0%
Description
We had re-weighted our cluster to make pg evenly-distribution at the time we built up this cluster, but pg becomes uneven again recently after expanding our cluster. And it could be painful to re-weighted osd now because in order to achieve pg evenly-distribution, we may need to re-weight osds couple of times and each time will trigger data movement.
To minimize this impact, I enhanced the crushtool to simulate real pg distribution for a specific pool by extracting the pg creating logic into the crushtool.
for example, it mainly use following options:
--num-rep: pool's replica number (not required)
--rule: which rule set to create the pool (not required)
--x: pool's pg num, should equal to pgp num (required)
--pool-id: existed pool id, or the pool id will be created later (usually current max pool id plus one) (required)
[ceph@c167 ~]$ crushtool -i compiled_crush_map --test --show-utilization --num-rep 2 --rule 0 --x 1024 --pool-id 2 rule 0 (replicated_ruleset), x = 0..1023, numrep = 2..2 rule 0 (replicated_ruleset) num_rep 2 result size == 2: 1024/1024 device 0: stored : 101 expected : 102.4 device 1: stored : 102 expected : 102.4 device 2: stored : 101 expected : 102.4 device 3: stored : 103 expected : 102.4 device 4: stored : 105 expected : 102.4 device 5: stored : 104 expected : 102.4 device 6: stored : 101 expected : 102.4 device 7: stored : 101 expected : 102.4 device 8: stored : 102 expected : 102.4 device 9: stored : 105 expected : 102.4 device 10: stored : 101 expected : 102.4 device 11: stored : 103 expected : 102.4 device 12: stored : 100 expected : 102.4 device 13: stored : 104 expected : 102.4 device 14: stored : 104 expected : 102.4 device 15: stored : 100 expected : 102.4 device 16: stored : 101 expected : 102.4 device 17: stored : 102 expected : 102.4 device 18: stored : 104 expected : 102.4 device 19: stored : 104 expected : 102.4 <pre> After getting the distribution, we could calculate the pg distribution's deviation, do crush re-weight on this crush map and test the crush map again until we satisfy with the distribution. Then set back the final crush map. So we can control the cluster to do data movement for one time. If your guys think it is helpful, please take a look. Thanks.