Bug #38157
closed
progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']
Added by Sage Weil about 5 years ago.
Updated almost 5 years ago.
Description
i've noticed this a few times
2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr:2019-02-03 04:16:50.837 7f23348fa700 -1 mgr notify Traceback (most recent call last):
2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: File "/usr/lib/ceph/mgr/progress/module.py", line 363, in notify
2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: self._osdmap_changed(old_osdmap, self._latest_osdmap)
2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: File "/usr/lib/ceph/mgr/progress/module.py", line 345, in _osdmap_changed
2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: self._osd_out(old_osdmap, old_dump, new_osdmap, osd_id)
2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: File "/usr/lib/ceph/mgr/progress/module.py", line 320, in _osd_out
2019-02-03T04:16:50.847 INFO:tasks.ceph.mgr.x.smithi072.stderr: ev.pg_update(self.get("pg_dump"), self.log)
2019-02-03T04:16:50.848 INFO:tasks.ceph.mgr.x.smithi072.stderr: File "/usr/lib/ceph/mgr/progress/module.py", line 147, in pg_update
2019-02-03T04:16:50.848 INFO:tasks.ceph.mgr.x.smithi072.stderr: pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']
2019-02-03T04:16:50.848 INFO:tasks.ceph.mgr.x.smithi072.stderr:KeyError: ('1.43',)
/a/sage-2019-02-03_00:19:05-rados-wip-sage-testing-2019-02-02-1454-distro-basic-smithi/3542892
by looking at the python backtrace, pg_str
was a tuple with only a single element in it, but it should have been a string. that's weird..
similar?
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr:2019-02-16 13:40:30.625 7f74ead29700 -1 mgr notify Traceback (most recent call last):
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 399, in notify
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: self._osdmap_changed(old_osdmap, self._latest_osdmap)
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 381, in _osdmap_changed
2019-02-16T13:40:30.630 INFO:tasks.ceph.mgr.y.smithi148.stderr: self._osd_out(old_osdmap, old_dump, new_osdmap, osd_id)
2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 356, in _osd_out
2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: ev.pg_update(self.get("pg_dump"), self.log)
2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: File "/usr/share/ceph/mgr/progress/module.py", line 155, in pg_update
2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr: pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']
2019-02-16T13:40:30.631 INFO:tasks.ceph.mgr.y.smithi148.stderr:KeyError: ('1.a',)
/a/kchai-2019-02-16_11:36:29-rados-wip-sage-testing-2019-02-16-1748-distro-basic-smithi/3601223
- Assignee set to Kefu Chai
/a/sage-2019-02-19_23:03:51-rados-wip-sage3-testing-2019-02-19-1008-distro-basic-smithi/3614261
/a/sage-2019-02-21_21:52:17-rados-wip-sage3-testing-2019-02-21-1359-distro-basic-smithi/3622638
/a/sage-2019-02-23_23:02:18-rados-wip-sage2-testing-2019-02-23-1354-distro-basic-smithi/3631867
I wasn't really able to reproduce this locally in my vstart cluster.
Also, don't really get how this is supposed to work, because in progress.module.Module#_osd_out
, affected_pgs
is something like
affected_pgs = [PgId(poll_id, ps) for ps in range(0, pool['pg_num']) if ...]
PgRecoveryEvent(which_pgs=affected_pgs)
where ps
will never be a character, but in the Traceback, it clearly asks for a pg_str
containing characters.
Anyway. found it.
/a/rdias-2019-02-26_22:35:27-rados-wip-rdias2-testing-distro-basic-smithi/3642422
theory: rados_api_tests triggers this
- Assignee deleted (
Kefu Chai)
reassigning from this ticket. i don't have a clue.
2019-04-11 16:52:56.429 7f635385d700 10 mgr.server operator() pool 1 pg_num_target 18 pg_num 23 -> 22 (merging 1.16 and 1.6)
...
2019-04-11 16:52:59.309 7f6360c27700 20 check_osd_map removing merged 1.16
2019-04-11 16:52:59.309 7f6360c27700 20 deleted pool 218
...
2019-04-11 16:52:59.441 7f6355060700 4 mgr[progress] got KeyError, see http://tracker.ceph.com/issues/38157
2019-04-11 16:52:59.441 7f6355060700 4 mgr[progress] pg_to_state as string: .... (does not include 1.16) ....
2019-04-11 16:52:59.441 7f6355060700 4 mgr[progress] pg_str "'1.16'"
2019-04-11 16:52:59.441 7f6355060700 -1 mgr notify progress.notify:
2019-04-11 16:52:59.441 7f6355060700 -1 mgr notify Traceback (most recent call last):
File "/usr/share/ceph/mgr/progress/module.py", line 408, in notify
self._osdmap_changed(old_osdmap, self._latest_osdmap)
File "/usr/share/ceph/mgr/progress/module.py", line 390, in _osdmap_changed
self._osd_out(old_osdmap, old_dump, new_osdmap, osd_id)
File "/usr/share/ceph/mgr/progress/module.py", line 365, in _osd_out
ev.pg_update(self.get("pg_dump"), self.log)
File "/usr/share/ceph/mgr/progress/module.py", line 156, in pg_update
pg_to_state[pg_str]['stat_sum']['num_bytes_recovered']
KeyError: '1.16'
- Status changed from 12 to Fix Under Review
- Status changed from Fix Under Review to Pending Backport
- Backport set to nautilus
- Copied to Backport #39344: nautilus: progress: KeyError on pg_to_state[pg_str]['stat_sum']['num_bytes_recovered'] added
- Status changed from Pending Backport to Resolved
Also available in: Atom
PDF