mds: MDBalancer using total (all time) request count in load statistics
2 - major
Pull request ID:
Crash signature (v1):
Crash signature (v2):
This was pointed out by Xiaoxi Chen
The get_req_rate() function is returning the value of l_mds_request, which is a counter.
This is then used in the load calculation in MDBalancer, resulting in crazy high values like this:
2017-10-09 10:05:09.128325 7fc899748700 0 mds.0.bal mds.1 mdsload<[0,0 0]/[0,0 0], req 3.11991e+07, hr 0, qlen 0, cpu 0.12> = 3.11991e+07 ~ 15711.7
#1 Updated by Xiaoxi Chen over 5 years ago
although it is simple to add last_timestamp and last_reqcount so that we can get an average TPS, but TPS may fluctuate a lot, which may result dirfrag ping-pong between multi mdss.
We probably need longer(configurable?) average for high fluctuate value like q_len and req_rate.
#2 Updated by Patrick Donnelly about 5 years ago
- Subject changed from MDBalancer using total (all time) request count in load statistics to mds: MDBalancer using total (all time) request count in load statistics
- Category deleted (
- Assignee set to Zheng Yan
- Target version set to v13.0.0
- Source set to Community (dev)
- Tags set to multimds,balancer
- Severity changed from 3 - minor to 2 - major
- Affected Versions v12.2.5 added
- Component(FS) MDS added
Zheng, please amend the above commit that it fixes this issue.
#3 Updated by Zheng Yan about 5 years ago
- Status changed from New to Fix Under Review
#4 Updated by Patrick Donnelly about 5 years ago
- Status changed from Fix Under Review to Pending Backport
- Tags changed from multimds,balancer to balancer
- Labels (FS) multimds added
#5 Updated by Nathan Cutler about 5 years ago
- Copied to Backport #23671: luminous: mds: MDBalancer using total (all time) request count in load statistics added
#6 Updated by Nathan Cutler almost 5 years ago
- Status changed from Pending Backport to Resolved