Bug #21745: mds: MDBalancer using total (all time) request count in load statistics - CephFS - Ceph

Actions

Copy link

Bug #21745

closed

mds: MDBalancer using total (all time) request count in load statistics

Added by John Spray over 6 years ago. Updated almost 6 years ago.

Status:

Resolved

Priority:

High

Assignee:

Zheng Yan

Category:

Target version:

Ceph - v13.0.0

% Done:

Source:

Community (dev)

Tags:

balancer

Backport:

luminous

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v12.2.5

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

multimds

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This was pointed out by Xiaoxi Chen

The get_req_rate() function is returning the value of l_mds_request, which is a counter.

This is then used in the load calculation in MDBalancer, resulting in crazy high values like this:

2017-10-09 10:05:09.128325 7fc899748700  0 mds.0.bal   mds.1 mdsload<[0,0 0]/[0,0 0], req 3.11991e+07, hr 0, qlen 0, cpu 0.12> = 3.11991e+07 ~ 15711.7

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Xiaoxi Chen over 6 years ago

although it is simple to add last_timestamp and last_reqcount so that we can get an average TPS, but TPS may fluctuate a lot, which may result dirfrag ping-pong between multi mdss.

We probably need longer(configurable?) average for high fluctuate value like q_len and req_rate.

Actions

Copy link

Updated by Patrick Donnelly about 6 years ago

Subject changed from MDBalancer using total (all time) request count in load statistics to mds: MDBalancer using total (all time) request count in load statistics
Category deleted (90)
Assignee set to Zheng Yan
Target version set to v13.0.0
Source set to Community (dev)
Tags set to multimds,balancer
Severity changed from 3 - minor to 2 - major
Affected Versions v12.2.5 added
Component(FS) MDS added