Bug #44968
cehpadm: another "RuntimeError: Set changed size during iteration"
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Apr 07 11:36:26 mon-2 bash[5400]: debug 2020-04-07T09:36:26.702+0000 7f77a0e9e700 -1 cephadm.serve: Apr 07 11:36:26 mon-2 bash[5400]: debug 2020-04-07T09:36:26.702+0000 7f77a0e9e700 -1 RuntimeError: Set changed size during iteration
As we don't have a clue where this happens, it depends on 44799 for now
Related issues
History
#1 Updated by Sebastian Wagner almost 4 years ago
- Blocked by Bug #44799: mgr: exception in module serve thread does not log traceback added
#2 Updated by Kiefer Chang almost 4 years ago
The issue was reported by an IRC user.
Basically he tried to select 6 OSDs for deleting from Dashboard, requests are sent but nothing happens.
After a while, he got the `RuntimeError: Set changed size during iteration` error.
NOTE: Dashboard sends 6 deleting operations to orchestrator layer in parallel.
Version: ceph version 15.2.0 (dc6a0b5c3cbf6a5e1d6d4f20b5ad466d76b96247) octopus (rc)
Another weird thing:
The user select osd.0 - osd.5 on osd-1 for deleting:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 232.26929 root default -9 58.06732 host osd-1 0 hdd 9.67789 osd.0 up 1.00000 1.00000 1 hdd 9.67789 osd.1 up 1.00000 1.00000 2 hdd 9.67789 osd.2 up 1.00000 1.00000 3 hdd 9.67789 osd.3 up 1.00000 1.00000 4 hdd 9.67789 osd.4 up 1.00000 1.00000 5 hdd 9.67789 osd.5 up 1.00000 1.00000 -3 58.06732 host osd-2 6 hdd 9.67789 osd.6 up 1.00000 1.00000 7 hdd 9.67789 osd.7 up 1.00000 1.00000 8 hdd 9.67789 osd.8 up 1.00000 1.00000 9 hdd 9.67789 osd.9 up 1.00000 1.00000 10 hdd 9.67789 osd.10 up 1.00000 1.00000 11 hdd 9.67789 osd.11 up 1.00000 1.00000 -5 58.06732 host osd-3 12 hdd 9.67789 osd.12 up 1.00000 1.00000 13 hdd 9.67789 osd.13 up 1.00000 1.00000 14 hdd 9.67789 osd.14 up 1.00000 1.00000 15 hdd 9.67789 osd.15 up 1.00000 1.00000 16 hdd 9.67789 osd.16 up 1.00000 1.00000 17 hdd 9.67789 osd.17 up 1.00000 1.00000 -7 58.06732 host osd-4 18 hdd 9.67789 osd.18 up 1.00000 1.00000 19 hdd 9.67789 osd.19 up 1.00000 1.00000 20 hdd 9.67789 osd.20 up 1.00000 1.00000 21 hdd 9.67789 osd.21 up 1.00000 1.00000 22 hdd 9.67789 osd.22 up 1.00000 1.00000 23 hdd 9.67789 osd.23 up 1.00000 1.00000
But the removal OSDs list displays correct OSD IDs with incorrect hostnames:
NAME HOST PGS STARTED_AT osd.4 osd-1 n/a 2020-04-07 07:33:42.474242 osd.2 osd-4 n/a 2020-04-07 07:33:42.518979 osd.3 osd-1 n/a 2020-04-07 07:33:42.455260 osd.1 osd-3 n/a 2020-04-07 07:33:42.551322 osd.0 osd-2 n/a 2020-04-07 07:33:42.541535 osd.5 osd-1 n/a 2020-04-07 07:33:42.444760
#3 Updated by Sebastian Wagner almost 4 years ago
- Status changed from New to Need More Info
next time, this traceback should be printed in the logs
#4 Updated by Sebastian Wagner over 3 years ago
- Status changed from Need More Info to Can't reproduce