Bug #44968
closed
cehpadm: another "RuntimeError: Set changed size during iteration"
Added by Sebastian Wagner about 4 years ago.
Updated almost 4 years ago.
Description
Apr 07 11:36:26 mon-2 bash[5400]: debug 2020-04-07T09:36:26.702+0000 7f77a0e9e700 -1 cephadm.serve:
Apr 07 11:36:26 mon-2 bash[5400]: debug 2020-04-07T09:36:26.702+0000 7f77a0e9e700 -1 RuntimeError: Set changed size during iteration
As we don't have a clue where this happens, it depends on 44799 for now
- Blocked by Bug #44799: mgr: exception in module serve thread does not log traceback added
The issue was reported by an IRC user.
Basically he tried to select 6 OSDs for deleting from Dashboard, requests are sent but nothing happens.
After a while, he got the `RuntimeError: Set changed size during iteration` error.
NOTE: Dashboard sends 6 deleting operations to orchestrator layer in parallel.
Version: ceph version 15.2.0 (dc6a0b5c3cbf6a5e1d6d4f20b5ad466d76b96247) octopus (rc)
Another weird thing:
The user select osd.0 - osd.5 on osd-1 for deleting:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 232.26929 root default
-9 58.06732 host osd-1
0 hdd 9.67789 osd.0 up 1.00000 1.00000
1 hdd 9.67789 osd.1 up 1.00000 1.00000
2 hdd 9.67789 osd.2 up 1.00000 1.00000
3 hdd 9.67789 osd.3 up 1.00000 1.00000
4 hdd 9.67789 osd.4 up 1.00000 1.00000
5 hdd 9.67789 osd.5 up 1.00000 1.00000
-3 58.06732 host osd-2
6 hdd 9.67789 osd.6 up 1.00000 1.00000
7 hdd 9.67789 osd.7 up 1.00000 1.00000
8 hdd 9.67789 osd.8 up 1.00000 1.00000
9 hdd 9.67789 osd.9 up 1.00000 1.00000
10 hdd 9.67789 osd.10 up 1.00000 1.00000
11 hdd 9.67789 osd.11 up 1.00000 1.00000
-5 58.06732 host osd-3
12 hdd 9.67789 osd.12 up 1.00000 1.00000
13 hdd 9.67789 osd.13 up 1.00000 1.00000
14 hdd 9.67789 osd.14 up 1.00000 1.00000
15 hdd 9.67789 osd.15 up 1.00000 1.00000
16 hdd 9.67789 osd.16 up 1.00000 1.00000
17 hdd 9.67789 osd.17 up 1.00000 1.00000
-7 58.06732 host osd-4
18 hdd 9.67789 osd.18 up 1.00000 1.00000
19 hdd 9.67789 osd.19 up 1.00000 1.00000
20 hdd 9.67789 osd.20 up 1.00000 1.00000
21 hdd 9.67789 osd.21 up 1.00000 1.00000
22 hdd 9.67789 osd.22 up 1.00000 1.00000
23 hdd 9.67789 osd.23 up 1.00000 1.00000
But the removal OSDs list displays correct OSD IDs with incorrect hostnames:
NAME HOST PGS STARTED_AT
osd.4 osd-1 n/a 2020-04-07 07:33:42.474242
osd.2 osd-4 n/a 2020-04-07 07:33:42.518979
osd.3 osd-1 n/a 2020-04-07 07:33:42.455260
osd.1 osd-3 n/a 2020-04-07 07:33:42.551322
osd.0 osd-2 n/a 2020-04-07 07:33:42.541535
osd.5 osd-1 n/a 2020-04-07 07:33:42.444760
- Status changed from New to Need More Info
next time, this traceback should be printed in the logs
- Status changed from Need More Info to Can't reproduce
Also available in: Atom
PDF