https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2019-02-04T11:18:31ZCeph mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1286532019-02-04T11:18:31ZKefu Chaitchaikov@gmail.com
<ul></ul><p>by looking at the python backtrace, <code> pg_str </code> was a tuple with only a single element in it, but it should have been a string. that's weird..</p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1295212019-02-16T15:51:43ZSage Weilsage@newdream.net
<ul></ul><p>similar?</p>
<pre>
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr:2019-02-16 13:40:30.625 7f74ead29700 -1 mgr notify Traceback (most recent call last):
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 399, in notify
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: self._osdmap_changed(old_osdmap, self._latest_osdmap)
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 381, in _osdmap_changed
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: self._osd_out(old_osdmap, old_dump, new_osdmap, osd_id)
2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 356, in _osd_out
2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: ev.pg_update(self.get("pg_dump"), self.log)
2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 155, in pg_update
2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']
2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr:KeyError: ('1.a',)
</pre><br />/a/kchai-2019-02-16_11:36:29-rados-wip-sage-testing-2019-02-16-1748-distro-basic-smithi/3601223 mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1295342019-02-18T07:13:53ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Assignee</strong> set to <i>Kefu Chai</i></li></ul> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1295562019-02-18T13:17:47ZKefu Chaitchaikov@gmail.com
<ul></ul><p>rerunning with more verbose log at</p>
<p>- <a class="external" href="http://pulpito.ceph.com/kchai-2019-02-18_13:16:54-rados:thrash-wip-38157-debugging-distro-basic-mira/">http://pulpito.ceph.com/kchai-2019-02-18_13:16:54-rados:thrash-wip-38157-debugging-distro-basic-mira/</a><br />- <a class="external" href="http://pulpito.ceph.com/kchai-2019-02-19_07:07:40-rados:thrash-wip-38157-debugging-distro-basic-mira/">http://pulpito.ceph.com/kchai-2019-02-19_07:07:40-rados:thrash-wip-38157-debugging-distro-basic-mira/</a></p>
<p>no luck =(</p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1297262019-02-20T13:05:42ZSage Weilsage@newdream.net
<ul></ul><p>/a/sage-2019-02-19_23:03:51-rados-wip-sage3-testing-2019-02-19-1008-distro-basic-smithi/3614261</p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1298702019-02-22T13:00:26ZSage Weilsage@newdream.net
<ul></ul><p>/a/sage-2019-02-21_21:52:17-rados-wip-sage3-testing-2019-02-21-1359-distro-basic-smithi/3622638</p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1299992019-02-24T15:23:33ZSage Weilsage@newdream.net
<ul></ul><p>/a/sage-2019-02-23_23:02:18-rados-wip-sage2-testing-2019-02-23-1354-distro-basic-smithi/3631867</p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1300512019-02-25T12:04:10ZSebastian Wagner
<ul></ul><p>I wasn't really able to reproduce this locally in my vstart cluster.</p>
<p><del>Also, don't really get how this is supposed to work, because in <code>progress.module.Module#_osd_out</code>, <code>affected_pgs</code> is something like</del></p>
<pre>
affected_pgs = [PgId(poll_id, ps) for ps in range(0, pool['pg_num']) if ...]
PgRecoveryEvent(which_pgs=affected_pgs)
</pre>
<p><del>where <code>ps</code> will never be a character, but in the Traceback, it clearly asks for a <code>pg_str</code> containing characters.</del></p>
<p>Anyway. found it.</p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1302092019-02-27T16:03:08ZSage Weilsage@newdream.net
<ul></ul><p>/a/rdias-2019-02-26_22:35:27-rados-wip-rdias2-testing-distro-basic-smithi/3642422</p>
<p>theory: rados_api_tests triggers this</p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1304322019-03-01T14:40:49ZSebastian Wagner
<ul></ul><p>No luck with finding the cause using type annotations and mypy: <a class="external" href="https://github.com/sebastian-philipp/ceph/commit/a79a0846c3aee84e6edac5d30d55224ebd663233">https://github.com/sebastian-philipp/ceph/commit/a79a0846c3aee84e6edac5d30d55224ebd663233</a></p>
<p>I don't think this is caused by Python code.</p>
<p>I'm not sure it makes sense to create a PR for this branch.</p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1329652019-03-28T08:17:11ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Assignee</strong> deleted (<del><i>Kefu Chai</i></del>)</li></ul><p>reassigning from this ticket. i don't have a clue.</p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1339632019-04-10T14:51:10ZSage Weilsage@newdream.net
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/27494">https://github.com/ceph/ceph/pull/27494</a></p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1344242019-04-12T12:00:31ZSage Weilsage@newdream.net
<ul></ul><pre>
2019-04-11 16:52:56.429 7f635385d700 10 mgr.server operator() pool 1 pg_num_target 18 pg_num 23 -> 22 (merging 1.16 and 1.6)
...
2019-04-11 16:52:59.309 7f6360c27700 20 check_osd_map removing merged 1.16
2019-04-11 16:52:59.309 7f6360c27700 20 deleted pool 218
...
2019-04-11 16:52:59.441 7f6355060700 4 mgr[progress] got KeyError, see http://tracker.ceph.com/issues/38157
2019-04-11 16:52:59.441 7f6355060700 4 mgr[progress] pg_to_state as string: .... (does not include 1.16) ....
2019-04-11 16:52:59.441 7f6355060700 4 mgr[progress] pg_str "'1.16'"
2019-04-11 16:52:59.441 7f6355060700 -1 mgr notify progress.notify:
2019-04-11 16:52:59.441 7f6355060700 -1 mgr notify Traceback (most recent call last):
File "/usr/share/ceph/mgr/progress/module.py", line 408, in notify
self._osdmap_changed(old_osdmap, self._latest_osdmap)
File "/usr/share/ceph/mgr/progress/module.py", line 390, in _osdmap_changed
self._osd_out(old_osdmap, old_dump, new_osdmap, osd_id)
File "/usr/share/ceph/mgr/progress/module.py", line 365, in _osd_out
ev.pg_update(self.get("pg_dump"), self.log)
File "/usr/share/ceph/mgr/progress/module.py", line 156, in pg_update
pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']
KeyError: '1.16'
</pre> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1344392019-04-12T12:08:16ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/27546">https://github.com/ceph/ceph/pull/27546</a></p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1347492019-04-16T13:32:52ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> set to <i>nautilus</i></li></ul><p>nautilus backport: <a class="external" href="https://github.com/ceph/ceph/pull/27608">https://github.com/ceph/ceph/pull/27608</a></p> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1348242019-04-17T06:08:56ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/39344">Backport #39344</a>: nautilus: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']</i> added</li></ul> mgr - Bug #38157: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']https://tracker.ceph.com/issues/38157?journal_id=1365832019-05-10T21:14:39ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>