Project

General

Profile

Actions

Bug #48309

closed

a few scrubs or remapped PGs blocks the upmap balancer

Added by Dan van der Ster over 3 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Category:
balancer module
Target version:
-
% Done:

100%

Source:
Community (dev)
Tags:
Backport:
octopus
Regression:
Yes
Severity:
2 - major
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In the balancer/module.py we have

            left = max_optimizations
            ...
            available = left - (num_pg - num_pg_active_clean)
            did = plan.osdmap.calc_pg_upmaps(inc, max_deviation, available, [pool])

The intention is not to balance PGs which are scrubbing, but this calculation of `available` means that if you have more than 10 PGs not in "active+clean" state (even out of thousands), there will be no balancing; instead you will get:

2020-11-20T11:28:28.966+0100 7fcee27a0700 -1 calc_pg_upmaps abort due to max <= 0

For example, in our cluster right now:

             8493 active+clean
             20   active+remapped+backfilling

and with `mgr/balancer/upmap_max_optimizations = 20` there is no more balancing triggered (max = 0).

This behaviour was introduced in 86444dbfe5d8478530182116507d034d9e180f5e. It wasn't backported to nautilus which is why we didn't notice until now.

I will send a PR with this fix, which I think was the intention here:

diff --git a/src/pybind/mgr/balancer/module.py b/src/pybind/mgr/balancer/module.py
index 5fffe01dcb..dd9fac46ec 100644
--- a/src/pybind/mgr/balancer/module.py
+++ b/src/pybind/mgr/balancer/module.py
@@ -1023,7 +1023,7 @@ class Module(MgrModule):
                     if s['state_name'] == 'active+clean':
                         num_pg_active_clean += s['count']
                         break
-            available = left - (num_pg - num_pg_active_clean)
+            available = min(left, num_pg_active_clean)
             did = plan.osdmap.calc_pg_upmaps(inc, max_deviation, available, [pool])
             total_did += did
             left -= did


Related issues 1 (0 open1 closed)

Copied to mgr - Backport #48399: octopus: a few scrubs or remapped PGs blocks the upmap balancerResolvedKonstantin ShalyginActions
Actions #1

Updated by Dan van der Ster over 3 years ago

  • Pull request ID set to 38206
Actions #2

Updated by Kefu Chai over 3 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Dan van der Ster
Actions #3

Updated by Kefu Chai over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Konstantin Shalygin over 3 years ago

  • Copied to Backport #48399: octopus: a few scrubs or remapped PGs blocks the upmap balancer added
Actions #5

Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #6

Updated by Konstantin Shalygin 8 months ago

  • Category set to balancer module
  • % Done changed from 0 to 100
  • Source set to Community (dev)
Actions

Also available in: Atom PDF