Project

General

Profile

Actions

Bug #41661

closed

radosbench_omap_write cleanup slow/stuck

Added by Neha Ojha over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-09-04T15:00:26.619 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:  298      16    432027    432011   5.66137   5.26562  0.00934655   0.0110388
2019-09-04T15:00:27.620 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:  299      16    433441    433425   5.66089   5.52344  0.00490376   0.0110397
2019-09-04T15:00:28.621 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Total time run:         300.008
2019-09-04T15:00:28.621 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Total writes made:      434767
2019-09-04T15:00:28.621 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Write size:             4096
2019-09-04T15:00:28.622 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Object size:            4096
2019-09-04T15:00:28.622 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Bandwidth (MB/sec):     5.66088
2019-09-04T15:00:28.622 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Stddev Bandwidth:       1.57132
2019-09-04T15:00:28.622 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Max bandwidth (MB/sec): 21.9219
2019-09-04T15:00:28.623 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Min bandwidth (MB/sec): 3.57031
2019-09-04T15:00:28.623 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Average IOPS:           1449
2019-09-04T15:00:28.623 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Stddev IOPS:            402.259
2019-09-04T15:00:28.623 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Max IOPS:               5612
2019-09-04T15:00:28.624 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Min IOPS:               914
2019-09-04T15:00:28.624 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Average Latency(s):     0.0110401
2019-09-04T15:00:28.624 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Stddev Latency(s):      0.00751487
2019-09-04T15:00:28.624 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Max latency(s):         0.16483
2019-09-04T15:00:28.624 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Min latency(s):         0.00048158
2019-09-04T15:00:28.625 INFO:tasks.radosbench.radosbench.0.smithi038.stdout:Cleaning up (deleting benchmark objects)
2019-09-04T17:32:59.741 ERROR:teuthology.run_tasks:Manager failed: radosbench
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 159, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-30059/qa/tasks/radosbench.py", line 136, in task
    run.wait(radosbench.itervalues(), timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 455, in wait
    check_time()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 132, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: reached maximum tries (1550) after waiting for 9300 seconds

/a/dzafman-2019-08-28_09:11:55-rados-wip-zafman-testing-distro-basic-smithi/4259247/
/a/nojha-2019-09-04_14:23:27-rados-wip-30059-distro-basic-smithi/4277906

Actions #1

Updated by Neha Ojha over 4 years ago

  • Assignee set to Neha Ojha
  • Priority changed from Normal to High
Actions #2

Updated by Neha Ojha over 4 years ago

The current timeout (config.get('time', 360) * 30 + 300 = 300*30 + 300) of 9300 seconds is not enough to clean up the omap entries.

We could either change this timeout to be more lenient, like we did in 597a97168673dbb6fb4263e5d19a1e8b7ea3d673 or reduce the time for which the workload runs, to reduce the amount of clean up required. (since there are no immediate users of these test results at this point)

Actions #3

Updated by Neha Ojha over 4 years ago

Clearly, filestore-xfs.yaml is the one failing consistently.

See http://pulpito.ceph.com/nojha-2019-09-09_23:22:30-rados:perf-master-distro-basic-smithi/
For a 300 seconds write workload, cleanup does not finish in 9300 seconds.

See http://pulpito.ceph.com/nojha-2019-09-09_23:59:57-rados:perf-master-distro-basic-smithi/
For a 120 seconds write workload, cleanup does not finish in 3900 seconds.

Might not be worth increasing the timeout or reducing the time of the test just for filestore.

Actions #4

Updated by Neha Ojha over 4 years ago

  • Status changed from 12 to Fix Under Review
  • Pull request ID set to 30309
Actions #5

Updated by Neha Ojha over 4 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF