problem with balancer module (mimic)
we've hit problem with balancer on two of our cluster.
ceph health suddenly spits:
MGR_MODULE_ERROR Module 'balancer' has failed: (40,)
manager log then shows following:
2019-11-01 14:57:44.112 7f497f642700 -1 balancer.serve:
2019-11-01 14:57:44.112 7f497f642700 -1 Traceback (most recent call last):
File "/usr/lib64/ceph/mgr/balancer/module.py", line 425, in serve
r, detail = self.optimize(plan)
File "/usr/lib64/ceph/mgr/balancer/module.py", line 693, in optimize
File "/usr/lib64/ceph/mgr/balancer/module.py", line 839, in do_crush_compat
weight = best_ws[osd]
we're using 13.2.6 on CENTOS7. don't have this problem on multiple other clusters running same version.
if I can provide further details, please let me know.
- Subject changed from problem with balancer module to problem with balancer module (mimic)
- Status changed from New to Need More Info
Can you attach your osdmap and/or crush map? It's not clear to me why there would be a tuple instead of a name here. if you can 'ceph osd getmap -o map' and then attach the resulting file to this ticket that would be great. Thanks!