Bug #58690
openthrashosds: IndexError: Cannot choose from an empty sequence
0%
Description
/a/lflores-2023-02-08_20:25:06-rados-wip-lflores-testing-2023-02-06-1529-distro-default-smithi/7162184
2023-02-09T06:25:04.977 DEBUG:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.6.asok dump_historic_ops
2023-02-09T06:25:05.018 INFO:tasks.ceph.osd.4.smithi188.stderr:2023-02-09T06:25:05.018+0000 7fe15d04e700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 4 (PID: 177284) UID: 0
2023-02-09T06:25:05.055 INFO:tasks.thrashosds.thrasher:in_osds: [] out_osds: [3, 0, 1, 5, 7, 4, 6, 2] dead_osds: [] live_osds: [3, 0, 5, 1, 7, 6, 4, 2]
2023-02-09T06:25:05.055 INFO:tasks.thrashosds.thrasher:choose_action: min_in 4 min_out 0 min_live 2 min_dead 0 chance_down 0.40
2023-02-09T06:25:05.055 INFO:tasks.thrashosds.thrasher:primary_affinity
2023-02-09T06:25:05.088 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ljflores_ceph_7d92b57b28ac7cab8bd62c6c814011e285b54dfc/qa/tasks/ceph_manager.py", line 190, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ljflores_ceph_7d92b57b28ac7cab8bd62c6c814011e285b54dfc/qa/tasks/ceph_manager.py", line 1422, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ljflores_ceph_7d92b57b28ac7cab8bd62c6c814011e285b54dfc/qa/tasks/ceph_manager.py", line 652, in primary_affinity
osd = random.choice(self.in_osds)
File "/usr/lib/python3.8/random.py", line 290, in choice
raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence
The issue is that in_osds is empty when random.choice tries to pick a random osd id.
Updated by Laura Flores about 1 year ago
This failure seems sporadic. I reran the same job that failed 50 times, and all succeeded except one, which failed for a different reason.
http://pulpito.front.sepia.ceph.com/lflores-2023-02-09_19:24:50-rados-wip-lflores-testing-2023-02-06-1529-distro-default-smithi/
Updated by Radoslaw Zarzynski about 1 year ago
def primary_affinity(self, osd=None):
self.log("primary_affinity")
if osd is None:
osd = random.choice(self.in_osds)
if random.random() >= .5:
pa = random.random()
elif random.random() >= .5:
pa = 1
else:
pa = 0
self.log('Setting osd %s primary_affinity to %f' % (str(osd), pa))
self.ceph_manager.raw_cluster_cmd('osd', 'primary-affinity',
str(osd), str(pa))
>>> random.choice(None) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.6/random.py", line 258, in choice i = self._randbelow(len(seq)) TypeError: object of type 'NoneType' has no len() >>> random.choice([]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.6/random.py", line 260, in choice raise IndexError('Cannot choose from an empty sequence') from None IndexError: Cannot choose from an empty sequence
So self.in_osds
is []
.
Updated by Radoslaw Zarzynski about 1 year ago
class OSDThrasher(Thrasher):
"""
Object used to thrash Ceph
"""
def __init__(self, manager, config, name, logger):
# ...
self.in_osds = osd_status['in']
So the question is why those OSDs haven't joined the cluster.
Updated by Laura Flores about 1 year ago
While thrashing OSDs in _do_thrash() (qa/tasks/ceph_manager.py), the task didn't add OSDs back into the cluster after killing/removing them:
lflores@teuthology:/a/lflores-2023-02-08_20:25:06-rados-wip-lflores-testing-2023-02-06-1529-distro-default-smithi/7162184$ cat teuthology.log | grep "Adding osd"
2023-02-09T06:16:22.945 INFO:tasks.thrashosds.thrasher:Adding osd 1
2023-02-09T06:19:34.997 INFO:tasks.thrashosds.thrasher:Adding osd 7
There should be way more lines of the task adding OSDs back into the cluster. The thrashing code normally checks if there are out OSDs and adds them back in. Continuing to dig into why this didn't happen as expected.