Project

General

Profile

Bug #48309

a few scrubs or remapped PGs blocks the upmap balancer

Added by Dan van der Ster over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus
Regression:
Yes
Severity:
2 - major
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In the balancer/module.py we have

            left = max_optimizations
            ...
            available = left - (num_pg - num_pg_active_clean)
            did = plan.osdmap.calc_pg_upmaps(inc, max_deviation, available, [pool])

The intention is not to balance PGs which are scrubbing, but this calculation of `available` means that if you have more than 10 PGs not in "active+clean" state (even out of thousands), there will be no balancing; instead you will get:

2020-11-20T11:28:28.966+0100 7fcee27a0700 -1 calc_pg_upmaps abort due to max <= 0

For example, in our cluster right now:

             8493 active+clean
             20   active+remapped+backfilling

and with `mgr/balancer/upmap_max_optimizations = 20` there is no more balancing triggered (max = 0).

This behaviour was introduced in 86444dbfe5d8478530182116507d034d9e180f5e. It wasn't backported to nautilus which is why we didn't notice until now.

I will send a PR with this fix, which I think was the intention here:

diff --git a/src/pybind/mgr/balancer/module.py b/src/pybind/mgr/balancer/module.py
index 5fffe01dcb..dd9fac46ec 100644
--- a/src/pybind/mgr/balancer/module.py
+++ b/src/pybind/mgr/balancer/module.py
@@ -1023,7 +1023,7 @@ class Module(MgrModule):
                     if s['state_name'] == 'active+clean':
                         num_pg_active_clean += s['count']
                         break
-            available = left - (num_pg - num_pg_active_clean)
+            available = min(left, num_pg_active_clean)
             did = plan.osdmap.calc_pg_upmaps(inc, max_deviation, available, [pool])
             total_did += did
             left -= did


Related issues

Copied to mgr - Backport #48399: octopus: a few scrubs or remapped PGs blocks the upmap balancer Resolved

History

#1 Updated by Dan van der Ster over 2 years ago

  • Pull request ID set to 38206

#2 Updated by Kefu Chai over 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Dan van der Ster

#3 Updated by Kefu Chai over 2 years ago

  • Status changed from Fix Under Review to Pending Backport

#4 Updated by Konstantin Shalygin over 2 years ago

  • Copied to Backport #48399: octopus: a few scrubs or remapped PGs blocks the upmap balancer added

#5 Updated by Nathan Cutler about 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF