Bug #54307
opentest_cls_rgw.sh: 'index_list_delimited' test times out
0%
Description
/a/yuriw-2022-02-16_15:53:49-rados-wip-yuri11-testing-2022-02-15-1643-distro-default-smithi/6688784
Last test run before Traceback:
2022-02-16T19:29:23.645 INFO:tasks.workunit.client.0.smithi070.stdout:[ OK ] cls_rgw.index_list (688 ms)
2022-02-16T19:29:23.645 INFO:tasks.workunit.client.0.smithi070.stdout:[ RUN ] cls_rgw.index_list_delimited
...
2022-02-16T20:07:19.010 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 347, in kill_osd
self.ceph_manager.kill_osd(osd)
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 2977, in kill_osd
self.ctx.daemons.get_daemon('osd', osd, self.cluster).stop()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/orchestra/daemon/state.py", line 139, in stop
run.wait([self.proc], timeout=timeout)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/orchestra/run.py", line 473, in wait
check_time()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/contextutil.py", line 133, in __call__
raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (50) after waiting for 300 seconds
Updated by Laura Flores about 2 years ago
/a/yuriw-2022-03-04_00:56:58-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6718934
Updated by Laura Flores about 2 years ago
- Has duplicate Bug #54507: workunit test cls/test_cls_rgw: Manager failed: thrashosds added
Updated by Laura Flores about 2 years ago
/a/yuriw-2022-03-10_02:41:10-rados-wip-yuri3-testing-2022-03-09-1350-distro-default-smithi/6729279
Updated by Casey Bodley about 2 years ago
very strange that this is timing out in the rados suite. the same test case is running in the rgw suite without failures
Updated by Casey Bodley about 2 years ago
@Laura i see there's a duplicate issue https://tracker.ceph.com/issues/54507 that points at problems with ceph-mgr? is this really a rgw bug?
Updated by Laura Flores about 2 years ago
@Casey the first "timeout" Traceback message appears at the 71% mark in the teuthology log:
2022-02-16T20:07:19.010 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 347, in kill_osd
self.ceph_manager.kill_osd(osd)
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 2977, in kill_osd
self.ctx.daemons.get_daemon('osd', osd, self.cluster).stop()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/orchestra/daemon/state.py", line 139, in stop
run.wait([self.proc], timeout=timeout)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/orchestra/run.py", line 473, in wait
check_time()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/contextutil.py", line 133, in __call__
raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (50) after waiting for 300 seconds
The "Manager failed" Traceback appears later at 99%:
2022-02-16T22:31:46.789 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/run_tasks.py", line 176, in run_tasks
suppress = manager.__exit__(*exc_info)
File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/thrashosds.py", line 215, in task
cluster_manager.wait_for_all_osds_up()
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 2759, in wait_for_all_osds_up
while not self.are_all_osds_up():
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 2749, in are_all_osds_up
x = self.get_osd_dump()
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 2522, in get_osd_dump
return self.get_osd_dump_json()['osds']
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 2514, in get_osd_dump_json
out = self.raw_cluster_cmd('osd', 'dump', '--format=json')
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 1597, in raw_cluster_cmd
return self.run_cluster_cmd(**kwargs).stdout.getvalue()
File "/home/teuthworker/src/github.com_ceph_ceph-c_c11e21b2a403e128a89552f2aa1019a3a9f8a012/qa/tasks/ceph_manager.py", line 1588, in run_cluster_cmd
return self.controller.run(**kwargs)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/orchestra/remote.py", line 509, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/orchestra/run.py", line 455, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_65c38a8ac6d6694ef8ab9ff2325a7faf73afdc22/teuthology/orchestra/run.py", line 183, in _raise_for_status
node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi070 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd dump --format=json'
This is why I categorized it as an RGW failure, since the manager failure seems to happen as a result of the timed out test. Another detail to note is that the test does not fail deterministically in the rados suite (see this passed test in the most recent master baseline: http://pulpito.front.sepia.ceph.com/yuriw-2022-03-04_21:59:11-rados-master-distro-default-smithi/6721856/), so it's possible that the failure hasn't shown up yet in the rgw suite. If possible, can you post a link to an example of this test in the rgw suite? I want to verify that it's been passing there before refiling this under rados.
Updated by Sridhar Seshasayee over 1 year ago
Seeing this in a Quincy run:
/a/yuriw-2022-08-08_22:19:32-rados-wip-yuri4-testing-2022-08-08-1009-quincy-distro-default-smithi/6962166