Project

General

Profile

Actions

Bug #36301

open

mgr/balancer: KeyError during balancer eval if pool migrating between roots

Added by Dan van der Ster over 5 years ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
balancer module
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If we try to run `ceph balancer eval` while a pool is migrating data between roots, this error will occur:

# ceph balancer eval
Error EINVAL: Traceback (most recent call last):
  File "/usr/lib64/ceph/mgr/balancer/module.py", line 321, in handle_command
    return (0, self.evaluate(ms, pools, verbose=verbose), '')
  File "/usr/lib64/ceph/mgr/balancer/module.py", line 620, in evaluate
    pe = self.calc_eval(ms, pools)
  File "/usr/lib64/ceph/mgr/balancer/module.py", line 507, in calc_eval
    pgs_by_osd[osd] += 1
KeyError: (1056,)

A fix for this would be:

diff --git a/src/pybind/mgr/balancer/module.py b/src/pybind/mgr/balancer/module.py
index ca090516c9..faaa5b448e 100644
--- a/src/pybind/mgr/balancer/module.py
+++ b/src/pybind/mgr/balancer/module.py
@@ -525,7 +525,11 @@ class Module(MgrModule):
                 for osd in [int(osd) for osd in up]:
                     if osd == CRUSHMap.ITEM_NONE:
                         continue
-                    pgs_by_osd[osd] += 1
+                    try:
+                        pgs_by_osd[osd] += 1
+                    except KeyError:
+                        # this can occur if the cluster is migrating pgs between roots
+                        pgs_by_osd[osd] = 1
                     objects_by_osd[osd] += ms.pg_stat[pgid]['num_objects']
                     bytes_by_osd[osd] += ms.pg_stat[pgid]['num_bytes']
                     # pick a root to associate this pg instance with.

but I don't know if this is sufficient.
Suggestions how to test an mgr module.py change on a live cluster?

Actions

Also available in: Atom PDF