Actions
Bug #49259
closedtest_rados_api tests timeout with cephadm (plus extremely large OSD logs)
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
swagner-2021-02-11_10:31:33-rados:cephadm-wip-swagner-testing-2021-02-09-1126-distro-basic-smithi/5874513
rados:cephadm/with-work/{distro/ubuntu_20.04 fixed-2 mode/packaged mon_election/classic msgr/async start tasks/rados_api_tests}
The tests actually complete:
grep -v -e 'slow request osd_op' -e 'heartbeat_check' -e get_health_metrics -e 'have slow ops' -e ' pgmap v' -e 'journalctl@ceph.mon' -e 'tcmu-runner' teuthology.log| less 2021-02-11T13:11:48.564 INFO:tasks.workunit.client.0.smithi091.stdout: api_aio_pp: [ OK ] LibRadosAioEC.ExecuteClassPP (7016 ms) 2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout: api_aio_pp: [ RUN ] LibRadosAioEC.OmapPP 2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout: api_aio_pp: [ OK ] LibRadosAioEC.OmapPP (7016 ms) 2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout: api_aio_pp: [ RUN ] LibRadosAioEC.MultiWritePP 2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout: api_aio_pp: [ OK ] LibRadosAioEC.MultiWritePP (7027 ms) 2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout: api_aio_pp: [----------] 20 tests from LibRadosAioEC (139516 ms total) 2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout: api_aio_pp: 2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout: api_aio_pp: [----------] Global test environment tear-down 2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout: api_aio_pp: [==========] 53 tests from 4 test suites ran. (294146 ms total) 2021-02-11T13:11:48.566 INFO:tasks.workunit.client.0.smithi091.stdout: api_aio_pp: [ PASSED ] 53 tests. 2021-02-11T16:04:13.814 INFO:journalctl@ceph.prometheus.a.smithi110.stdout:Feb 11 16:04:13 smithi110 bash[29017]: level=info ts=2021-02-11T16:04:13.510Z caller=compact.go:495 component=tsdb msg="write block" mint=1613048653399 maxt=1613052000000 ulid=01EY8XBWFN9THM46AQ56K0YJ9V duration=80.839364ms 2021-02-11T16:04:13.814 INFO:journalctl@ceph.prometheus.a.smithi110.stdout:Feb 11 16:04:13 smithi110 bash[29017]: level=info ts=2021-02-11T16:04:13.523Z caller=head.go:662 component=tsdb msg="Head GC completed" duration=2.564364ms 2021-02-11T16:06:54.424 INFO:tasks.workunit.client.0.smithi091.stderr:++ cleanup 2021-02-11T16:06:54.424 INFO:tasks.workunit.client.0.smithi091.stderr:++ pkill -P 35759 2021-02-11T16:06:54.462 INFO:tasks.workunit.client.0.smithi091.stderr:++ true 2021-02-11T16:06:54.494 DEBUG:teuthology.orchestra.run:got remote process result: 124
but lots of slow ops:
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.c.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[17322]: cluster 2021-02-11T16:06:53.847115+0000 osd.0 (osd.0) 72781 : cluster [WRN] slow request osd_op(client.24646.0:2078 16.14 16:2b70cbe7:test-rados-api-smithi091-35842-17::big.copy:head [copy-from ver 0 in=74b] snapc 0=[] RETRY=7 ondisk+retry+write+known_if_redirected e350) initiated 2021-02-11T15:07:05.776639+0000 currently delayed 2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173794+0000 osd.2 (osd.2) 18309 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected e350) initiated 2021-02-11T15:07:35.487913+0000 currently delayed 2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173821+0000 osd.2 (osd.2) 18310 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] RETRY=3 ondisk+retry+read+known_if_redirected e350) initiated 2021-02-11T15:22:35.490704+0000 currently delayed 2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173861+0000 osd.2 (osd.2) 18311 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected e350) initiated 2021-02-11T14:52:35.486234+0000 currently delayed 2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173928+0000 osd.2 (osd.2) 18312 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] RETRY=4 ondisk+retry+read+known_if_redirected e350) initiated 2021-02-11T15:37:35.490076+0000 currently delayed 2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173954+0000 osd.2 (osd.2) 18313 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] RETRY=5 ondisk+retry+read+known_if_redirected e350) initiated 2021-02-11T15:52:35.490534+0000 currently delayed 2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173977+0000 osd.2 (osd.2) 18314 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] ondisk+read+known_if_redirected e350) initiated 2021-02-11T14:36:22.874727+0000 currently delayed 2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:53.846990+0000 osd.0 (osd.0) 72771 : cluster [WRN] slow request osd_op(client.24646.0:2078 16.14 16:2b70cbe7:test-rados-api-smithi091-35842-17::big.copy:head [copy-from ver 0 in=74b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e341) initiated 2021-02-11T13:52:05.770047+0000 currently delayed 2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:53.847012+0000 osd.0 (osd.0) 72772 : cluster [WRN] slow request osd_op(client.24646.0:2078 16.14 16:2b70cbe7:test-rados-api-smithi091-35842-17::big.copy:head [copy-from ver 0 in=74b] snapc 0=[] RETRY=1 ondisk+retry+write+known_if_redirected e341) initiated 2021-02-11T13:37:05.766073+0000 currently delayed 2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:53.847025+0000 osd.0 (osd.0) 72773 : cluster [WRN] slow request osd_op(client.24646.0:2078 16.14 16:2b70cbe7:test-rados-api-smithi091-35842-17::big.copy:head [copy-from ver 0 in=74b] snapc 0=[] RETRY=4 ondisk+retry+write+known_if_redirected e341) initiated 2021-02-11T14:22:05.770189+0000 currently delayed 2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:53.847036+0000 osd.0 (osd.0) 72774 : cluster [WRN] slow request osd_op(client.24646.0:2078 16.14 16:2b70cbe7:test-rados-api-smithi091-35842-17::big.copy:head [copy-from ver 0 in=74b] snapc 0=[] RE
lots of uncommon health warnings:
2021-02-11T15:30:15.764 INFO:journalctl@ceph.mon.b.smithi110.stdout:Feb 11 15:30:01 smithi110 bash[13423]: cluster 2021-02-11T15:30:00.001835+0000 mon.a (mon.0) 3849 : cluster [WRN] [WRN] CACHE_POOL_NO_HIT_SET: 1 cache pools are missing hit_sets 2021-02-11T15:30:15.764 INFO:journalctl@ceph.mon.b.smithi110.stdout:Feb 11 15:30:01 smithi110 bash[13423]: cluster 2021-02-11T15:30:00.001854+0000 mon.a (mon.0) 3850 : cluster [WRN] pool 'test-rados-api-smithi091-35848-11' with cache_mode writeback needs hit_set_type to be set but it is not 2021-02-11T15:30:15.764 INFO:journalctl@ceph.mon.b.smithi110.stdout:Feb 11 15:30:01 smithi110 bash[13423]: cluster 2021-02-11T15:30:00.001874+0000 mon.a (mon.0) 3851 : cluster [WRN] [WRN] POOL_APP_NOT_ENABLED: 1 pool(s) do not have an application enabled 2021-02-11T15:30:15.764 INFO:journalctl@ceph.mon.b.smithi110.stdout:Feb 11 15:30:01 smithi110 bash[13423]: cluster 2021-02-11T15:30:00.001888+0000 mon.a (mon.0) 3852 : cluster [WRN] application not enabled on pool 'ceph_test_rados_api_asio' 2021-02-11T15:30:15.764 INFO:journalctl@ceph.mon.b.smithi110.stdout:Feb 11 15:30:01 smithi110 bash[13423]: cluster 2021-02-11T15:30:00.001901+0000 mon.a (mon.0) 3853 : cluster [WRN] use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications
Actions