Project

General

Profile

Bug #44968

cehpadm: another "RuntimeError: Set changed size during iteration"

Added by Sebastian Wagner 7 months ago. Updated 3 months ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Apr 07 11:36:26 mon-2 bash[5400]: debug 2020-04-07T09:36:26.702+0000 7f77a0e9e700 -1 cephadm.serve:
Apr 07 11:36:26 mon-2 bash[5400]: debug 2020-04-07T09:36:26.702+0000 7f77a0e9e700 -1 RuntimeError: Set changed size during iteration

As we don't have a clue where this happens, it depends on 44799 for now


Related issues

Blocked by mgr - Bug #44799: mgr: exception in module serve thread does not log traceback Resolved

History

#1 Updated by Sebastian Wagner 7 months ago

  • Blocked by Bug #44799: mgr: exception in module serve thread does not log traceback added

#2 Updated by Kiefer Chang 7 months ago

The issue was reported by an IRC user.
Basically he tried to select 6 OSDs for deleting from Dashboard, requests are sent but nothing happens.
After a while, he got the `RuntimeError: Set changed size during iteration` error.

NOTE: Dashboard sends 6 deleting operations to orchestrator layer in parallel.
Version: ceph version 15.2.0 (dc6a0b5c3cbf6a5e1d6d4f20b5ad466d76b96247) octopus (rc)

Another weird thing:
The user select osd.0 - osd.5 on osd-1 for deleting:

ID  CLASS  WEIGHT     TYPE NAME       STATUS  REWEIGHT  PRI-AFF
-1         232.26929  root default
-9          58.06732      host osd-1
 0    hdd    9.67789          osd.0       up   1.00000  1.00000
 1    hdd    9.67789          osd.1       up   1.00000  1.00000
 2    hdd    9.67789          osd.2       up   1.00000  1.00000
 3    hdd    9.67789          osd.3       up   1.00000  1.00000
 4    hdd    9.67789          osd.4       up   1.00000  1.00000
 5    hdd    9.67789          osd.5       up   1.00000  1.00000
-3          58.06732      host osd-2
 6    hdd    9.67789          osd.6       up   1.00000  1.00000
 7    hdd    9.67789          osd.7       up   1.00000  1.00000
 8    hdd    9.67789          osd.8       up   1.00000  1.00000
 9    hdd    9.67789          osd.9       up   1.00000  1.00000
10    hdd    9.67789          osd.10      up   1.00000  1.00000
11    hdd    9.67789          osd.11      up   1.00000  1.00000
-5          58.06732      host osd-3
12    hdd    9.67789          osd.12      up   1.00000  1.00000
13    hdd    9.67789          osd.13      up   1.00000  1.00000
14    hdd    9.67789          osd.14      up   1.00000  1.00000
15    hdd    9.67789          osd.15      up   1.00000  1.00000
16    hdd    9.67789          osd.16      up   1.00000  1.00000
17    hdd    9.67789          osd.17      up   1.00000  1.00000
-7          58.06732      host osd-4
18    hdd    9.67789          osd.18      up   1.00000  1.00000
19    hdd    9.67789          osd.19      up   1.00000  1.00000
20    hdd    9.67789          osd.20      up   1.00000  1.00000
21    hdd    9.67789          osd.21      up   1.00000  1.00000
22    hdd    9.67789          osd.22      up   1.00000  1.00000
23    hdd    9.67789          osd.23      up   1.00000  1.00000

But the removal OSDs list displays correct OSD IDs with incorrect hostnames:

NAME  HOST  PGS STARTED_AT
osd.4 osd-1 n/a 2020-04-07 07:33:42.474242
osd.2 osd-4 n/a 2020-04-07 07:33:42.518979
osd.3 osd-1 n/a 2020-04-07 07:33:42.455260
osd.1 osd-3 n/a 2020-04-07 07:33:42.551322
osd.0 osd-2 n/a 2020-04-07 07:33:42.541535
osd.5 osd-1 n/a 2020-04-07 07:33:42.444760

#3 Updated by Sebastian Wagner 6 months ago

  • Status changed from New to Need More Info

next time, this traceback should be printed in the logs

#4 Updated by Sebastian Wagner 3 months ago

  • Status changed from Need More Info to Can't reproduce

Also available in: Atom PDF