https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2019-11-11T22:25:30ZCeph mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1512412019-11-11T22:25:30ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>mgr</i></li><li><strong>Category</strong> deleted (<del><i>common</i></del>)</li></ul> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1515932019-11-14T15:26:16ZSage Weilsage@newdream.net
<ul><li><strong>Subject</strong> changed from <i>problem with balancer module</i> to <i>problem with balancer module (mimic)</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>Need More Info</i></li></ul><p>Can you attach your osdmap and/or crush map? It's not clear to me why there would be a tuple instead of a name here. if you can 'ceph osd getmap -o map' and then attach the resulting file to this ticket that would be great. Thanks!</p> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1517012019-11-14T19:49:41ZNikola Ciprichnikola.ciprich@linuxbox.cz
<ul><li><strong>File</strong> <a href="/attachments/download/4584/map.gz">map.gz</a> added</li></ul><p>Hi Greh, sure! Attached is the map. BR. nik</p> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1538562019-12-09T11:07:29ZLenz Grimmer
<ul><li><strong>Duplicated by</strong> <i><a class="issue tracker-1 status-10 priority-4 priority-default closed" href="/issues/43181">Bug #43181</a>: Module 'balancer' has failed: (104,) - with Unhandled Exception</i> added</li></ul> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1538582019-12-09T11:09:06ZLenz Grimmer
<ul><li><strong>Affected Versions</strong> <i>v14.2.4</i> added</li></ul><p>This seems to affect Nautilus as well - see <a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: Module 'balancer' has failed: (104,) - with Unhandled Exception (Duplicate)" href="https://tracker.ceph.com/issues/43181">#43181</a> for a similar report:<br /><pre>
2019-12-06 20:18:01.031 xxxxxxxxxxxx -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'balancer' while running on mgr.xxxxx: (104,)
2019-12-06 20:18:01.031 xxxxxxxxxxxx -1 balancer.serve:
2019-12-06 20:18:01.031 xxxxxxxxxxxx -1 Traceback (most recent call last):
File "/usr/share/ceph/mgr/balancer/module.py", line 624, in serve
r, detail = self.optimize(plan)
File "/usr/share/ceph/mgr/balancer/module.py", line 891, in optimize
return self.do_crush_compat(plan)
File "/usr/share/ceph/mgr/balancer/module.py", line 1053, in do_crush_compat
weight = best_ws[osd]
KeyError: (104,)
</pre></p> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1541322019-12-11T17:10:38ZLenz Grimmer
<ul><li><strong>Subject</strong> changed from <i>problem with balancer module (mimic)</i> to <i>mgr: KeyError messages in balancer module</i></li><li><strong>Severity</strong> changed from <i>3 - minor</i> to <i>2 - major</i></li></ul> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1541332019-12-11T17:11:17ZLenz Grimmer
<ul><li><strong>Subject</strong> changed from <i>mgr: KeyError messages in balancer module</i> to <i>mgr/balancer: KeyError messages in balancer module</i></li><li><strong>Category</strong> set to <i>balancer module</i></li><li><strong>Backport</strong> set to <i>nautilus, mimic</i></li></ul> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1547712019-12-22T18:31:07ZNikola Ciprichnikola.ciprich@linuxbox.cz
<ul></ul><p>Hi, I just noticed this ticket is still in needmoreinfo state, I've provided the requested map, is there anything else I can provide to help?</p> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1575722020-01-31T09:28:25ZLenz Grimmer
<ul><li><strong>Status</strong> changed from <i>Need More Info</i> to <i>New</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1611882020-03-17T19:15:53ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>Sage Weil</i></li></ul><p>Finally figured this out!</p>
<p>The problem is in calc_eval().<br />- target_by_root does not include osd X because the crush weight is 0, but it has a weight-set weight > 0<br />- we initialize pgs_by_osd to 0 for each OSD in target_by_root osds<br />- in the loop over pm (pg_up_by_pool) we encounter a pg that maps to osd X<br />- pgs_by_osd[X] throws the KeyError because X isn't there</p> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1611912020-03-17T20:04:57ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li><li><strong>Pull request ID</strong> set to <i>34014</i></li></ul> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1612842020-03-18T22:43:05ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li></ul> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1612902020-03-19T02:55:07ZKonstantin Shalygink0ste@k0ste.ru
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/44674">Backport #44674</a>: nautilus: mgr/balancer: KeyError messages in balancer module</i> added</li></ul> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1612922020-03-19T02:55:15ZKonstantin Shalygink0ste@k0ste.ru
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-6 priority-4 priority-default closed" href="/issues/44675">Backport #44675</a>: mimic: mgr/balancer: KeyError messages in balancer module</i> added</li></ul> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1681552020-06-15T11:28:51ZNikola Ciprichnikola.ciprich@linuxbox.cz
<ul></ul><p>Hi, just wanted to report, that I've hit the same problem on 14.2.8 with the fix applied. Haven't studied the code much more, but maybe there's similar problem further in the code:</p>
<p>Jun 14 21:29:34 vfnjazv1a ceph-mgr<sup><a href="#fn2027947">2027947</a></sup>: 2020-06-14 21:29:34.173 7fa227f9f700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'balancer' while running on mgr.vfnjazv1<br />Jun 14 21:29:34 vfnjazv1a ceph-mgr<sup><a href="#fn2027947">2027947</a></sup>: 2020-06-14 21:29:34.173 7fa227f9f700 -1 balancer.serve:<br />Jun 14 21:29:34 vfnjazv1a ceph-mgr<sup><a href="#fn2027947">2027947</a></sup>: 2020-06-14 21:29:34.173 7fa227f9f700 -1 Traceback (most recent call last):<br />Jun 14 21:29:34 vfnjazv1a ceph-mgr<sup><a href="#fn2027947">2027947</a></sup>: File "/usr/share/ceph/mgr/balancer/module.py", line 654, in serve<br />Jun 14 21:29:34 vfnjazv1a ceph-mgr<sup><a href="#fn2027947">2027947</a></sup>: r, detail = self.optimize(plan)<br />Jun 14 21:29:34 vfnjazv1a ceph-mgr<sup><a href="#fn2027947">2027947</a></sup>: File "/usr/share/ceph/mgr/balancer/module.py", line 924, in optimize<br />Jun 14 21:29:34 vfnjazv1a ceph-mgr<sup><a href="#fn2027947">2027947</a></sup>: return self.do_crush_compat(plan)<br />Jun 14 21:29:34 vfnjazv1a ceph-mgr<sup><a href="#fn2027947">2027947</a></sup>: File "/usr/share/ceph/mgr/balancer/module.py", line 1085, in do_crush_compat<br />Jun 14 21:29:34 vfnjazv1a ceph-mgr<sup><a href="#fn2027947">2027947</a></sup>: weight = best_ws[osd]<br />Jun 14 21:29:34 vfnjazv1a ceph-mgr<sup><a href="#fn2027947">2027947</a></sup>: KeyError: (76,)</p> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1723332020-08-05T13:41:50ZNikola Ciprichnikola.ciprich@linuxbox.cz
<ul></ul><p>got it, this other problem was caused by empty buckets:</p>
<p>-28 0 host vfnjazv1-ssd-test <br />-25 0 root ssdtest <br />-26 0 host vfnjazv1a-ssd-test <br />-27 0 host vfnjazv1b-ssd-test <br />-29 0 host vfnjazv1c-ssd-test</p>
<p>when I removed them, it's gone. So I guess this also needs fixing..</p>
<p>BR<br />nik</p> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1837162021-01-27T20:04:14ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul><p>While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".</p> mgr - Bug #42721: mgr/balancer: KeyError messages in balancer modulehttps://tracker.ceph.com/issues/42721?journal_id=1863432021-03-02T22:30:37ZDavid Zafmandzafman@redhat.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-1 status-3 priority-5 priority-high3 closed" href="/issues/49576">Bug #49576</a>: mgr/balancer: KeyError messages in balancer module</i> added</li></ul>