Project

General

Profile

Bug #42721

problem with balancer module (mimic)

Added by Nikola Ciprich 9 days ago. Updated 5 days ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
11/10/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

we've hit problem with balancer on two of our cluster.
ceph health suddenly spits:
MGR_MODULE_ERROR Module 'balancer' has failed: (40,)

manager log then shows following:

2019-11-01 14:57:44.112 7f497f642700 -1 balancer.serve:
2019-11-01 14:57:44.112 7f497f642700 -1 Traceback (most recent call last):
File "/usr/lib64/ceph/mgr/balancer/module.py", line 425, in serve
r, detail = self.optimize(plan)
File "/usr/lib64/ceph/mgr/balancer/module.py", line 693, in optimize
return self.do_crush_compat(plan)
File "/usr/lib64/ceph/mgr/balancer/module.py", line 839, in do_crush_compat
weight = best_ws[osd]
KeyError: (40,)

we're using 13.2.6 on CENTOS7. don't have this problem on multiple other clusters running same version.

if I can provide further details, please let me know.

map.gz (3.88 KB) Nikola Ciprich, 11/14/2019 07:49 PM

History

#1 Updated by Greg Farnum 8 days ago

  • Project changed from Ceph to mgr
  • Category deleted (common)

#2 Updated by Sage Weil 5 days ago

  • Subject changed from problem with balancer module to problem with balancer module (mimic)
  • Status changed from New to Need More Info

Can you attach your osdmap and/or crush map? It's not clear to me why there would be a tuple instead of a name here. if you can 'ceph osd getmap -o map' and then attach the resulting file to this ticket that would be great. Thanks!

#3 Updated by Nikola Ciprich 5 days ago

Hi Greh, sure! Attached is the map. BR. nik

Also available in: Atom PDF