Project

General

Profile

Actions

Bug #58690

open

thrashosds: IndexError: Cannot choose from an empty sequence

Added by Laura Flores about 1 year ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/lflores-2023-02-08_20:25:06-rados-wip-lflores-testing-2023-02-06-1529-distro-default-smithi/7162184

2023-02-09T06:25:04.977 DEBUG:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.6.asok dump_historic_ops
2023-02-09T06:25:05.018 INFO:tasks.ceph.osd.4.smithi188.stderr:2023-02-09T06:25:05.018+0000 7fe15d04e700 -1 received  signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 4  (PID: 177284) UID: 0
2023-02-09T06:25:05.055 INFO:tasks.thrashosds.thrasher:in_osds:  [] out_osds:  [3, 0, 1, 5, 7, 4, 6, 2] dead_osds:  [] live_osds:  [3, 0, 5, 1, 7, 6, 4, 2]
2023-02-09T06:25:05.055 INFO:tasks.thrashosds.thrasher:choose_action: min_in 4 min_out 0 min_live 2 min_dead 0 chance_down 0.40
2023-02-09T06:25:05.055 INFO:tasks.thrashosds.thrasher:primary_affinity
2023-02-09T06:25:05.088 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ljflores_ceph_7d92b57b28ac7cab8bd62c6c814011e285b54dfc/qa/tasks/ceph_manager.py", line 190, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ljflores_ceph_7d92b57b28ac7cab8bd62c6c814011e285b54dfc/qa/tasks/ceph_manager.py", line 1422, in _do_thrash
    self.choose_action()()
  File "/home/teuthworker/src/github.com_ljflores_ceph_7d92b57b28ac7cab8bd62c6c814011e285b54dfc/qa/tasks/ceph_manager.py", line 652, in primary_affinity
    osd = random.choice(self.in_osds)
  File "/usr/lib/python3.8/random.py", line 290, in choice
    raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence

The issue is that in_osds is empty when random.choice tries to pick a random osd id.

Actions #1

Updated by Laura Flores about 1 year ago

This failure seems sporadic. I reran the same job that failed 50 times, and all succeeded except one, which failed for a different reason.
http://pulpito.front.sepia.ceph.com/lflores-2023-02-09_19:24:50-rados-wip-lflores-testing-2023-02-06-1529-distro-default-smithi/

Actions #2

Updated by Radoslaw Zarzynski about 1 year ago

    def primary_affinity(self, osd=None):
        self.log("primary_affinity")
        if osd is None:
            osd = random.choice(self.in_osds)
        if random.random() >= .5:
            pa = random.random()
        elif random.random() >= .5:
            pa = 1
        else:
            pa = 0
        self.log('Setting osd %s primary_affinity to %f' % (str(osd), pa))
        self.ceph_manager.raw_cluster_cmd('osd', 'primary-affinity',
                                          str(osd), str(pa))
>>> random.choice(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/random.py", line 258, in choice
    i = self._randbelow(len(seq))
TypeError: object of type 'NoneType' has no len()
>>> random.choice([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/random.py", line 260, in choice
    raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence

So self.in_osds is [].

Actions #3

Updated by Radoslaw Zarzynski about 1 year ago

class OSDThrasher(Thrasher):
    """ 
    Object used to thrash Ceph
    """ 
    def __init__(self, manager, config, name, logger):
        # ...
        self.in_osds = osd_status['in']

So the question is why those OSDs haven't joined the cluster.

Actions #4

Updated by Laura Flores about 1 year ago

  • Assignee set to Laura Flores
Actions #5

Updated by Laura Flores about 1 year ago

While thrashing OSDs in _do_thrash() (qa/tasks/ceph_manager.py), the task didn't add OSDs back into the cluster after killing/removing them:

lflores@teuthology:/a/lflores-2023-02-08_20:25:06-rados-wip-lflores-testing-2023-02-06-1529-distro-default-smithi/7162184$ cat teuthology.log | grep "Adding osd" 
2023-02-09T06:16:22.945 INFO:tasks.thrashosds.thrasher:Adding osd 1
2023-02-09T06:19:34.997 INFO:tasks.thrashosds.thrasher:Adding osd 7

There should be way more lines of the task adding OSDs back into the cluster. The thrashing code normally checks if there are out OSDs and adds them back in. Continuing to dig into why this didn't happen as expected.

Actions

Also available in: Atom PDF