Project

General

Profile

Bug #41386

pg_autoscaler: pool id key not present in pool_stats

Added by Sage Weil over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
pg_autoscaler module
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-08-21T17:08:41.253 INFO:tasks.workunit.client.0.smithi159.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:435: test_tiering_1:  ceph osd pool delete slow2 slow2 --yes-i-really-really-mean-it
2019-08-21T17:08:42.130 INFO:tasks.ceph.mgr.x.smithi159.stderr:2019-08-21T17:08:42.124+0000 7fc0d7c48700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'pg_autoscaler' while running on mgr.x: (3L,)
2019-08-21T17:08:42.130 INFO:tasks.ceph.mgr.x.smithi159.stderr:2019-08-21T17:08:42.124+0000 7fc0d7c48700 -1 pg_autoscaler.serve:
2019-08-21T17:08:42.130 INFO:tasks.ceph.mgr.x.smithi159.stderr:2019-08-21T17:08:42.124+0000 7fc0d7c48700 -1 Traceback (most recent call last):
2019-08-21T17:08:42.130 INFO:tasks.ceph.mgr.x.smithi159.stderr:  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 164, in serve
2019-08-21T17:08:42.130 INFO:tasks.ceph.mgr.x.smithi159.stderr:    self._maybe_adjust()
2019-08-21T17:08:42.131 INFO:tasks.ceph.mgr.x.smithi159.stderr:  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 339, in _maybe_adjust
2019-08-21T17:08:42.131 INFO:tasks.ceph.mgr.x.smithi159.stderr:    ps, root_map, pool_root = self._get_pool_status(osdmap, pools)
2019-08-21T17:08:42.131 INFO:tasks.ceph.mgr.x.smithi159.stderr:  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 273, in _get_pool_status
2019-08-21T17:08:42.131 INFO:tasks.ceph.mgr.x.smithi159.stderr:    pool_logical_used = pool_stats[pool_id]['bytes_used']
2019-08-21T17:08:42.131 INFO:tasks.ceph.mgr.x.smithi159.stderr:KeyError: (3L,)
2019-08-21T17:08:42.131 INFO:tasks.ceph.mgr.x.smithi159.stderr:
2019-08-21T17:08:42.240 INFO:tasks.workunit.client.0.smithi159.stderr:pool 'slow2' does not exist
2019-08-21T17:08:42.251 INFO:tasks.workunit.client.0.smithi159.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:436: test_tiering_1:  ceph osd pool delete cache cache --yes-i-really-really-mean-it

/a/sage-2019-08-21_15:17:39-rados-wip-sage2-testing-2019-08-20-0935-distro-basic-smithi/4237079

Related issues

Copied to mgr - Backport #41436: nautilus: pg_autoscaler: pool id key not present in pool_stats Resolved

History

#1 Updated by Sage Weil over 4 years ago

  • Pull request ID set to 29807

#2 Updated by Kefu Chai over 4 years ago

i ran into a similar issue while testing https://github.com/ceph/ceph/pull/29035,
see https://github.com/ceph/ceph/pull/29035#discussion_r316500629

but i am not sure why the error looks like

KeyError: (3L,)

seems we passed a tuple of "(3,)" instead of an integer to pool_stats as the index.

#3 Updated by Sebastian Wagner over 4 years ago

don't be totally mislead by the tuple. I think this comes down to the arguments passed to Exception:

In [1]: str(Exception(2))
Out[1]: '2'

In [2]: repr(Exception(2))
Out[2]: 'Exception(2,)'

In [3]: Exception(2).args
Out[3]: (2,)

I remember a similar case where this was misleading.

I might be wrong here, but don't just look at the strange tuple.

#4 Updated by Sebastian Wagner over 4 years ago

  • Category set to pg_autoscaler module

#5 Updated by Sage Weil over 4 years ago

  • Status changed from 12 to Pending Backport

#6 Updated by Kefu Chai over 4 years ago

Thanks Sebastian, that explains!

#7 Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41436: nautilus: pg_autoscaler: pool id key not present in pool_stats added

#8 Updated by Kefu Chai over 4 years ago

i still have

2019-08-29T07:52:47.255+0000 7f691c344700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'pg_autoscaler' while running on mgr.x: (1,)
2019-08-29T07:52:47.255+0000 7f691c344700 -1 pg_autoscaler.serve:
2019-08-29T07:52:47.255+0000 7f691c344700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 175, in serve
    self._update_progress_events()
  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 353, in _update_progress_events
    pool_data = pools[int(pool_id)]
KeyError: (1,)

while testing https://github.com/ceph/ceph/pull/29035

/a/kchai-2019-08-29_03:14:53-rados-wip-kefu-testing-2019-08-27-1807-distro-basic-mira/4260378/

the tested branch contains https://github.com/ceph/ceph/pull/29807

#9 Updated by Sage Weil over 4 years ago

  • Status changed from Pending Backport to 12

I saw it again too,

2019-10-04T20:18:20.761 INFO:tasks.ceph.mgr.x.smithi183.stderr:2019-10-04T20:18:20.767+0000 7f67d9e9d700 -1 Traceback (most recent call last):
2019-10-04T20:18:20.761 INFO:tasks.ceph.mgr.x.smithi183.stderr:  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 175, in serve
2019-10-04T20:18:20.761 INFO:tasks.ceph.mgr.x.smithi183.stderr:    self._update_progress_events()
2019-10-04T20:18:20.762 INFO:tasks.ceph.mgr.x.smithi183.stderr:  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 353, in _update_progress_events
2019-10-04T20:18:20.762 INFO:tasks.ceph.mgr.x.smithi183.stderr:    pool_data = pools[int(pool_id)]
2019-10-04T20:18:20.762 INFO:tasks.ceph.mgr.x.smithi183.stderr:KeyError: (1,)

/a/sage-2019-10-04_18:20:43-rados-wip-sage-testing-2019-10-04-0923-distro-basic-smithi/4358946

#10 Updated by Sage Weil over 4 years ago

  • Status changed from 12 to Pending Backport

different cause! see #42249

#11 Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF