Bug #36301: mgr/balancer: KeyError during balancer eval if pool migrating between roots - mgr - Ceph

Actions

Copy link

Bug #36301

open

mgr/balancer: KeyError during balancer eval if pool migrating between roots

Added by Dan van der Ster over 5 years ago. Updated 7 months ago.

Status:

New

Priority:

Normal

Assignee:

Category:

balancer module

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

If we try to run `ceph balancer eval` while a pool is migrating data between roots, this error will occur:

# ceph balancer eval
Error EINVAL: Traceback (most recent call last):
  File "/usr/lib64/ceph/mgr/balancer/module.py", line 321, in handle_command
    return (0, self.evaluate(ms, pools, verbose=verbose), '')
  File "/usr/lib64/ceph/mgr/balancer/module.py", line 620, in evaluate
    pe = self.calc_eval(ms, pools)
  File "/usr/lib64/ceph/mgr/balancer/module.py", line 507, in calc_eval
    pgs_by_osd[osd] += 1
KeyError: (1056,)

A fix for this would be:

diff --git a/src/pybind/mgr/balancer/module.py b/src/pybind/mgr/balancer/module.py
index ca090516c9..faaa5b448e 100644
--- a/src/pybind/mgr/balancer/module.py
+++ b/src/pybind/mgr/balancer/module.py
@@ -525,7 +525,11 @@ class Module(MgrModule):
                 for osd in [int(osd) for osd in up]:
                     if osd == CRUSHMap.ITEM_NONE:
                         continue
-                    pgs_by_osd[osd] += 1
+                    try:
+                        pgs_by_osd[osd] += 1
+                    except KeyError:
+                        # this can occur if the cluster is migrating pgs between roots
+                        pgs_by_osd[osd] = 1
                     objects_by_osd[osd] += ms.pg_stat[pgid]['num_objects']
                     bytes_by_osd[osd] += ms.pg_stat[pgid]['num_bytes']
                     # pick a root to associate this pg instance with.

but I don't know if this is sufficient.
Suggestions how to test an mgr module.py change on a live cluster?

Actions

Copy link

Updated by Dan van der Ster over 5 years ago

Obviously objects_by_osd and bytes_by_osd will need similar try/except.

Moving on, `ceph balancer optimize` (with upmap) crashes like this:

2018-10-03 16:19:01.437216 7f6e34949700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.8/rpm/el7/BUILD/ceph-12.2.8/src/osd/OSDMap.cc: In function 'int OSDMap::calc_pg_upmaps(CephContext*, float, int, const std::set<long int>&, OSDMap::Incremental*)' thread 7f6e34949700 time 2018-10-03 16:19:01.435308
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.8/rpm/el7/BUILD/ceph-12.2.8/src/osd/OSDMap.cc: 4102: FAILED assert(target > 0)

The comment on this code indicates this isn't going to work:

      // make sure osd is still there (belongs to this crush-tree)
      assert(osd_weight.count(osd));
      float target = osd_weight[osd] * pgs_per_weight;
      assert(target > 0);

And to be clear, this occurs when we have pgs active+remapped+backfilling (moving from room=A to room=B).

Should we just fail more gracefully when PGs aren't where they are expected?

Actions

Copy link