Bug #38157
closedprogress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']
0%
Description
i've noticed this a few times
2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr:2019-02-03 04:16:50.837 7f23348fa700 -1 mgr notify Traceback (most recent call last): 2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: File "/usr/lib/ceph/mgr/progress/module.py", line 363, in notify 2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: self._osdmap_changed(old_osdmap, self._latest_osdmap) 2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: File "/usr/lib/ceph/mgr/progress/module.py", line 345, in _osdmap_changed 2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: self._osd_out(old_osdmap, old_dump, new_osdmap, osd_id) 2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: File "/usr/lib/ceph/mgr/progress/module.py", line 320, in _osd_out 2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: ev.pg_update(self.get("pg_dump"), self.log) 2019-02-03T04:16:50.848 INFO:tasks.ceph.mgr.x.smithi072.stderr: File "/usr/lib/ceph/mgr/progress/module.py", line 147, in pg_update 2019-02-03T04:16:50.848 INFO:tasks.ceph.mgr.x.smithi072.stderr: pg_to_state[pg_str]['stat_sum']['num_bytes_recovered'] 2019-02-03T04:16:50.848 INFO:tasks.ceph.mgr.x.smithi072.stderr:KeyError: ('1.43',)
/a/sage-2019-02-03_00:19:05-rados-wip-sage-testing-2019-02-02-1454-distro-basic-smithi/3542892
Updated by Kefu Chai about 5 years ago
by looking at the python backtrace, pg_str
was a tuple with only a single element in it, but it should have been a string. that's weird..
Updated by Sage Weil about 5 years ago
similar?
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr:2019-02-16 13:40:30.625 7f74ead29700 -1 mgr notify Traceback (most recent call last): 2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 399, in notify 2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: self._osdmap_changed(old_osdmap, self._latest_osdmap) 2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 381, in _osdmap_changed 2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: self._osd_out(old_osdmap, old_dump, new_osdmap, osd_id) 2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 356, in _osd_out 2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: ev.pg_update(self.get("pg_dump"), self.log) 2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 155, in pg_update 2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: pg_to_state[pg_str]['stat_sum']['num_bytes_recovered'] 2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr:KeyError: ('1.a',)
/a/kchai-2019-02-16_11:36:29-rados-wip-sage-testing-2019-02-16-1748-distro-basic-smithi/3601223
Updated by Kefu Chai about 5 years ago
Updated by Sage Weil about 5 years ago
/a/sage-2019-02-19_23:03:51-rados-wip-sage3-testing-2019-02-19-1008-distro-basic-smithi/3614261
Updated by Sage Weil about 5 years ago
/a/sage-2019-02-21_21:52:17-rados-wip-sage3-testing-2019-02-21-1359-distro-basic-smithi/3622638
Updated by Sage Weil about 5 years ago
/a/sage-2019-02-23_23:02:18-rados-wip-sage2-testing-2019-02-23-1354-distro-basic-smithi/3631867
Updated by Sebastian Wagner about 5 years ago
I wasn't really able to reproduce this locally in my vstart cluster.
Also, don't really get how this is supposed to work, because in progress.module.Module#_osd_out
, affected_pgs
is something like
affected_pgs = [PgId(poll_id, ps) for ps in range(0, pool['pg_num']) if ...] PgRecoveryEvent(which_pgs=affected_pgs)
where ps
will never be a character, but in the Traceback, it clearly asks for a pg_str
containing characters.
Anyway. found it.
Updated by Sage Weil about 5 years ago
/a/rdias-2019-02-26_22:35:27-rados-wip-rdias2-testing-distro-basic-smithi/3642422
theory: rados_api_tests triggers this
Updated by Sebastian Wagner about 5 years ago
No luck with finding the cause using type annotations and mypy: https://github.com/sebastian-philipp/ceph/commit/a79a0846c3aee84e6edac5d30d55224ebd663233
I don't think this is caused by Python code.
I'm not sure it makes sense to create a PR for this branch.
Updated by Kefu Chai about 5 years ago
- Assignee deleted (
Kefu Chai)
reassigning from this ticket. i don't have a clue.
Updated by Sage Weil about 5 years ago
2019-04-11 16:52:56.429 7f635385d700 10 mgr.server operator() pool 1 pg_num_target 18 pg_num 23 -> 22 (merging 1.16 and 1.6) ... 2019-04-11 16:52:59.309 7f6360c27700 20 check_osd_map removing merged 1.16 2019-04-11 16:52:59.309 7f6360c27700 20 deleted pool 218 ... 2019-04-11 16:52:59.441 7f6355060700 4 mgr[progress] got KeyError, see http://tracker.ceph.com/issues/38157 2019-04-11 16:52:59.441 7f6355060700 4 mgr[progress] pg_to_state as string: .... (does not include 1.16) .... 2019-04-11 16:52:59.441 7f6355060700 4 mgr[progress] pg_str "'1.16'" 2019-04-11 16:52:59.441 7f6355060700 -1 mgr notify progress.notify: 2019-04-11 16:52:59.441 7f6355060700 -1 mgr notify Traceback (most recent call last): File "/usr/share/ceph/mgr/progress/module.py", line 408, in notify self._osdmap_changed(old_osdmap, self._latest_osdmap) File "/usr/share/ceph/mgr/progress/module.py", line 390, in _osdmap_changed self._osd_out(old_osdmap, old_dump, new_osdmap, osd_id) File "/usr/share/ceph/mgr/progress/module.py", line 365, in _osd_out ev.pg_update(self.get("pg_dump"), self.log) File "/usr/share/ceph/mgr/progress/module.py", line 156, in pg_update pg_to_state[pg_str]['stat_sum']['num_bytes_recovered'] KeyError: '1.16'
Updated by Sage Weil about 5 years ago
- Status changed from 12 to Fix Under Review
Updated by Sage Weil about 5 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to nautilus
nautilus backport: https://github.com/ceph/ceph/pull/27608
Updated by Nathan Cutler about 5 years ago
- Copied to Backport #39344: nautilus: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered'] added
Updated by Nathan Cutler almost 5 years ago
- Status changed from Pending Backport to Resolved