Bug #41661
closedradosbench_omap_write cleanup slow/stuck
0%
Description
2019-09-04T15:00:26.619 INFO:tasks.radosbench.radosbench.0.smithi038.stdout: 298 16 432027 432011 5.66137 5.26562 0.00934655 0.0110388 2019-09-04T15:00:27.620 INFO:tasks.radosbench.radosbench.0.smithi038.stdout: 299 16 433441 433425 5.66089 5.52344 0.00490376 0.0110397 2019-09-04T15:00:28.621 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Total time run: 300.008 2019-09-04T15:00:28.621 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Total writes made: 434767 2019-09-04T15:00:28.621 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Write size: 4096 2019-09-04T15:00:28.622 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Object size: 4096 2019-09-04T15:00:28.622 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Bandwidth (MB/sec): 5.66088 2019-09-04T15:00:28.622 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Stddev Bandwidth: 1.57132 2019-09-04T15:00:28.622 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Max bandwidth (MB/sec): 21.9219 2019-09-04T15:00:28.623 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Min bandwidth (MB/sec): 3.57031 2019-09-04T15:00:28.623 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Average IOPS: 1449 2019-09-04T15:00:28.623 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Stddev IOPS: 402.259 2019-09-04T15:00:28.623 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Max IOPS: 5612 2019-09-04T15:00:28.624 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Min IOPS: 914 2019-09-04T15:00:28.624 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Average Latency(s): 0.0110401 2019-09-04T15:00:28.624 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Stddev Latency(s): 0.00751487 2019-09-04T15:00:28.624 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Max latency(s): 0.16483 2019-09-04T15:00:28.624 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Min latency(s): 0.00048158 2019-09-04T15:00:28.625 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Cleaning up (deleting benchmark objects) 2019-09-04T17:32:59.741 ERROR:teuthology.run_tasks:Manager failed: radosbench Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 159, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-30059/qa/tasks/radosbench.py", line 136, in task run.wait(radosbench.itervalues(), timeout=timeout) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 455, in wait check_time() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 132, in __call__ raise MaxWhileTries(error_msg) MaxWhileTries: reached maximum tries (1550) after waiting for 9300 seconds
/a/dzafman-2019-08-28_09:11:55-rados-wip-zafman-testing-distro-basic-smithi/4259247/
/a/nojha-2019-09-04_14:23:27-rados-wip-30059-distro-basic-smithi/4277906
Updated by Neha Ojha over 4 years ago
- Assignee set to Neha Ojha
- Priority changed from Normal to High
Updated by Neha Ojha over 4 years ago
The current timeout (config.get('time', 360) * 30 + 300 = 300*30 + 300) of 9300 seconds is not enough to clean up the omap entries.
We could either change this timeout to be more lenient, like we did in 597a97168673dbb6fb4263e5d19a1e8b7ea3d673 or reduce the time for which the workload runs, to reduce the amount of clean up required. (since there are no immediate users of these test results at this point)
Updated by Neha Ojha over 4 years ago
Clearly, filestore-xfs.yaml is the one failing consistently.
See http://pulpito.ceph.com/nojha-2019-09-09_23:22:30-rados:perf-master-distro-basic-smithi/
For a 300 seconds write workload, cleanup does not finish in 9300 seconds.
See http://pulpito.ceph.com/nojha-2019-09-09_23:59:57-rados:perf-master-distro-basic-smithi/
For a 120 seconds write workload, cleanup does not finish in 3900 seconds.
Might not be worth increasing the timeout or reducing the time of the test just for filestore.
Updated by Neha Ojha over 4 years ago
- Status changed from 12 to Fix Under Review
- Pull request ID set to 30309
Updated by Neha Ojha over 4 years ago
- Status changed from Fix Under Review to Resolved