Bug #44968: cehpadm: another "RuntimeError: Set changed size during iteration" - Orchestrator - Ceph

Actions

Copy link

Bug #44968

closed

cehpadm: another "RuntimeError: Set changed size during iteration"

Added by Sebastian Wagner about 4 years ago. Updated over 3 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

cephadm

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Apr 07 11:36:26 mon-2 bash[5400]: debug 2020-04-07T09:36:26.702+0000 7f77a0e9e700 -1 cephadm.serve:
Apr 07 11:36:26 mon-2 bash[5400]: debug 2020-04-07T09:36:26.702+0000 7f77a0e9e700 -1 RuntimeError: Set changed size during iteration

As we don't have a clue where this happens, it depends on 44799 for now

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Sebastian Wagner about 4 years ago

Blocked by Bug #44799: mgr: exception in module serve thread does not log traceback added

Actions

Copy link

Updated by Kiefer Chang about 4 years ago

The issue was reported by an IRC user.
Basically he tried to select 6 OSDs for deleting from Dashboard, requests are sent but nothing happens.
After a while, he got the `RuntimeError: Set changed size during iteration` error.

NOTE: Dashboard sends 6 deleting operations to orchestrator layer in parallel.
Version: ceph version 15.2.0 (dc6a0b5c3cbf6a5e1d6d4f20b5ad466d76b96247) octopus (rc)

Another weird thing:
The user select osd.0 - osd.5 on osd-1 for deleting:

ID  CLASS  WEIGHT     TYPE NAME       STATUS  REWEIGHT  PRI-AFF
-1         232.26929  root default
-9          58.06732      host osd-1
 0    hdd    9.67789          osd.0       up   1.00000  1.00000
 1    hdd    9.67789          osd.1       up   1.00000  1.00000
 2    hdd    9.67789          osd.2       up   1.00000  1.00000
 3    hdd    9.67789          osd.3       up   1.00000  1.00000
 4    hdd    9.67789          osd.4       up   1.00000  1.00000
 5    hdd    9.67789          osd.5       up   1.00000  1.00000
-3          58.06732      host osd-2
 6    hdd    9.67789          osd.6       up   1.00000  1.00000
 7    hdd    9.67789          osd.7       up   1.00000  1.00000
 8    hdd    9.67789          osd.8       up   1.00000  1.00000
 9    hdd    9.67789          osd.9       up   1.00000  1.00000
10    hdd    9.67789          osd.10      up   1.00000  1.00000
11    hdd    9.67789          osd.11      up   1.00000  1.00000
-5          58.06732      host osd-3
12    hdd    9.67789          osd.12      up   1.00000  1.00000
13    hdd    9.67789          osd.13      up   1.00000  1.00000
14    hdd    9.67789          osd.14      up   1.00000  1.00000
15    hdd    9.67789          osd.15      up   1.00000  1.00000
16    hdd    9.67789          osd.16      up   1.00000  1.00000
17    hdd    9.67789          osd.17      up   1.00000  1.00000
-7          58.06732      host osd-4
18    hdd    9.67789          osd.18      up   1.00000  1.00000
19    hdd    9.67789          osd.19      up   1.00000  1.00000
20    hdd    9.67789          osd.20      up   1.00000  1.00000
21    hdd    9.67789          osd.21      up   1.00000  1.00000
22    hdd    9.67789          osd.22      up   1.00000  1.00000
23    hdd    9.67789          osd.23      up   1.00000  1.00000

But the removal OSDs list displays correct OSD IDs with incorrect hostnames:

NAME  HOST  PGS STARTED_AT
osd.4 osd-1 n/a 2020-04-07 07:33:42.474242
osd.2 osd-4 n/a 2020-04-07 07:33:42.518979
osd.3 osd-1 n/a 2020-04-07 07:33:42.455260
osd.1 osd-3 n/a 2020-04-07 07:33:42.551322
osd.0 osd-2 n/a 2020-04-07 07:33:42.541535
osd.5 osd-1 n/a 2020-04-07 07:33:42.444760

Actions

Copy link