Project

General

Profile

Actions

Bug #63137

open

osd: Improved pg-upmap computing speed

Added by Yuxuan Hu 7 months ago. Updated 3 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
OSDMap
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When certain Ceph clusters need to be scaled up, such as increasing the number of replicas from 3 to 6, the process of scaling up OSDs and recalculating the upmap for load balancing can result in significantly long computation times for some clusters. Upon examining the logs, we discovered that the majority of the computation time is spent in the OSDMap::try_drop_remap_overfull function when the desired level of balance cannot be achieved.In this function, the majority of the time is spent on the operation of trying to drop existing remapping pairs. However, when deleting these mapping pairs, only the load condition of the receiving OSD is considered, without taking into account the load condition of the migrating OSD. Upon reviewing the logs, it was observed that in many cases, the deviation of the migrating OSD is greater than that of the receiving OSD. This means that deleting these mapping changes in such situations will definitely not optimize the cluster's load condition. On the contrary, it may not reduce the standard deviation of the cluster. Therefore, we believe that in such cases, there is no need to enter this branch and execute the subsequent test change code.


Files

图像2023-10-9 15.06.jpeg (53 KB) 图像2023-10-9 15.06.jpeg log1 Yuxuan Hu, 10/09/2023 07:10 AM
图像2023-10-9 15.07.jpeg (265 KB) 图像2023-10-9 15.07.jpeg log2 Yuxuan Hu, 10/09/2023 07:10 AM
Actions #1

Updated by Yuxuan Hu 7 months ago

Actions #2

Updated by Laura Flores 3 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 53891
Actions

Also available in: Atom PDF