Bug #63137
openosd: Improved pg-upmap computing speed
0%
Description
When certain Ceph clusters need to be scaled up, such as increasing the number of replicas from 3 to 6, the process of scaling up OSDs and recalculating the upmap for load balancing can result in significantly long computation times for some clusters. Upon examining the logs, we discovered that the majority of the computation time is spent in the OSDMap::try_drop_remap_overfull function when the desired level of balance cannot be achieved.In this function, the majority of the time is spent on the operation of trying to drop existing remapping pairs. However, when deleting these mapping pairs, only the load condition of the receiving OSD is considered, without taking into account the load condition of the migrating OSD. Upon reviewing the logs, it was observed that in many cases, the deviation of the migrating OSD is greater than that of the receiving OSD. This means that deleting these mapping changes in such situations will definitely not optimize the cluster's load condition. On the contrary, it may not reduce the standard deviation of the cluster. Therefore, we believe that in such cases, there is no need to enter this branch and execute the subsequent test change code.
Files
Updated by Yuxuan Hu 7 months ago
fix pull request: https://github.com/ceph/ceph/pull/53891
Updated by Laura Flores 3 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 53891