Bug #49576
closedmgr/balancer: KeyError messages in balancer module
0%
Description
we've hit problem with balancer on two of our cluster.
ceph health suddenly spits:
MGR_MODULE_ERROR Module 'balancer' has failed: (40,)
manager log then shows following:
2019-11-01 14:57:44.112 7f497f642700 -1 balancer.serve:
2019-11-01 14:57:44.112 7f497f642700 -1 Traceback (most recent call last):
File "/usr/lib64/ceph/mgr/balancer/module.py", line 425, in serve
r, detail = self.optimize(plan)
File "/usr/lib64/ceph/mgr/balancer/module.py", line 693, in optimize
return self.do_crush_compat(plan)
File "/usr/lib64/ceph/mgr/balancer/module.py", line 839, in do_crush_compat
weight = best_ws[osd]
KeyError: (40,)
we're using 13.2.6 on CENTOS7. don't have this problem on multiple other clusters running same version.
if I can provide further details, please let me know.
Files