Project

General

Profile

Actions

Bug #49259

closed

test_rados_api tests timeout with cephadm (plus extremely large OSD logs)

Added by Sebastian Wagner about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

swagner-2021-02-11_10:31:33-rados:cephadm-wip-swagner-testing-2021-02-09-1126-distro-basic-smithi/5874513

rados:cephadm/with-work/{distro/ubuntu_20.04 fixed-2 mode/packaged mon_election/classic msgr/async start tasks/rados_api_tests}

The tests actually complete:

grep -v -e 'slow request osd_op' -e 'heartbeat_check' -e get_health_metrics -e 'have slow ops' -e ' pgmap v' -e 'journalctl@ceph.mon' -e 'tcmu-runner'  teuthology.log| less
2021-02-11T13:11:48.564 INFO:tasks.workunit.client.0.smithi091.stdout:               api_aio_pp: [       OK ] LibRadosAioEC.ExecuteClassPP (7016 ms)
2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout:               api_aio_pp: [ RUN      ] LibRadosAioEC.OmapPP
2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout:               api_aio_pp: [       OK ] LibRadosAioEC.OmapPP (7016 ms)
2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout:               api_aio_pp: [ RUN      ] LibRadosAioEC.MultiWritePP
2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout:               api_aio_pp: [       OK ] LibRadosAioEC.MultiWritePP (7027 ms)
2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout:               api_aio_pp: [----------] 20 tests from LibRadosAioEC (139516 ms total)
2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout:               api_aio_pp:
2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout:               api_aio_pp: [----------] Global test environment tear-down
2021-02-11T13:11:48.565 INFO:tasks.workunit.client.0.smithi091.stdout:               api_aio_pp: [==========] 53 tests from 4 test suites ran. (294146 ms total)
2021-02-11T13:11:48.566 INFO:tasks.workunit.client.0.smithi091.stdout:               api_aio_pp: [  PASSED  ] 53 tests.
2021-02-11T16:04:13.814 INFO:journalctl@ceph.prometheus.a.smithi110.stdout:Feb 11 16:04:13 smithi110 bash[29017]: level=info ts=2021-02-11T16:04:13.510Z caller=compact.go:495 component=tsdb msg="write block" mint=1613048653399 maxt=1613052000000 ulid=01EY8XBWFN9THM46AQ56K0YJ9V duration=80.839364ms
2021-02-11T16:04:13.814 INFO:journalctl@ceph.prometheus.a.smithi110.stdout:Feb 11 16:04:13 smithi110 bash[29017]: level=info ts=2021-02-11T16:04:13.523Z caller=head.go:662 component=tsdb msg="Head GC completed" duration=2.564364ms
2021-02-11T16:06:54.424 INFO:tasks.workunit.client.0.smithi091.stderr:++ cleanup
2021-02-11T16:06:54.424 INFO:tasks.workunit.client.0.smithi091.stderr:++ pkill -P 35759
2021-02-11T16:06:54.462 INFO:tasks.workunit.client.0.smithi091.stderr:++ true
2021-02-11T16:06:54.494 DEBUG:teuthology.orchestra.run:got remote process result: 124

but lots of slow ops:

2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.c.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[17322]: cluster 2021-02-11T16:06:53.847115+0000 osd.0 (osd.0) 72781 : cluster [WRN] slow request osd_op(client.24646.0:2078 16.14 16:2b70cbe7:test-rados-api-smithi091-35842-17::big.copy:head [copy-from ver 0 in=74b] snapc 0=[] RETRY=7 ondisk+retry+write+known_if_redirected e350) initiated 2021-02-11T15:07:05.776639+0000 currently delayed
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173794+0000 osd.2 (osd.2) 18309 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected e350) initiated 2021-02-11T15:07:35.487913+0000 currently delayed
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173821+0000 osd.2 (osd.2) 18310 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] RETRY=3 ondisk+retry+read+known_if_redirected e350) initiated 2021-02-11T15:22:35.490704+0000 currently delayed
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173861+0000 osd.2 (osd.2) 18311 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected e350) initiated 2021-02-11T14:52:35.486234+0000 currently delayed
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173928+0000 osd.2 (osd.2) 18312 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] RETRY=4 ondisk+retry+read+known_if_redirected e350) initiated 2021-02-11T15:37:35.490076+0000 currently delayed
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173954+0000 osd.2 (osd.2) 18313 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] RETRY=5 ondisk+retry+read+known_if_redirected e350) initiated 2021-02-11T15:52:35.490534+0000 currently delayed
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:52.173977+0000 osd.2 (osd.2) 18314 : cluster [WRN] slow request osd_op(client.37922.0:257 174.19 174:9c1d23e5:test-rados-api-smithi091-35848-12::bam:head [read 0~1] snapc 0=[] ondisk+read+known_if_redirected e350) initiated 2021-02-11T14:36:22.874727+0000 currently delayed
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:53.846990+0000 osd.0 (osd.0) 72771 : cluster [WRN] slow request osd_op(client.24646.0:2078 16.14 16:2b70cbe7:test-rados-api-smithi091-35842-17::big.copy:head [copy-from ver 0 in=74b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e341) initiated 2021-02-11T13:52:05.770047+0000 currently delayed
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:53.847012+0000 osd.0 (osd.0) 72772 : cluster [WRN] slow request osd_op(client.24646.0:2078 16.14 16:2b70cbe7:test-rados-api-smithi091-35842-17::big.copy:head [copy-from ver 0 in=74b] snapc 0=[] RETRY=1 ondisk+retry+write+known_if_redirected e341) initiated 2021-02-11T13:37:05.766073+0000 currently delayed
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:53.847025+0000 osd.0 (osd.0) 72773 : cluster [WRN] slow request osd_op(client.24646.0:2078 16.14 16:2b70cbe7:test-rados-api-smithi091-35842-17::big.copy:head [copy-from ver 0 in=74b] snapc 0=[] RETRY=4 ondisk+retry+write+known_if_redirected e341) initiated 2021-02-11T14:22:05.770189+0000 currently delayed
2021-02-11T16:06:54.144 INFO:journalctl@ceph.mon.a.smithi091.stdout:Feb 11 16:06:53 smithi091 bash[13455]: cluster 2021-02-11T16:06:53.847036+0000 osd.0 (osd.0) 72774 : cluster [WRN] slow request osd_op(client.24646.0:2078 16.14 16:2b70cbe7:test-rados-api-smithi091-35842-17::big.copy:head [copy-from ver 0 in=74b] snapc 0=[] RE

lots of uncommon health warnings:

2021-02-11T15:30:15.764 INFO:journalctl@ceph.mon.b.smithi110.stdout:Feb 11 15:30:01 smithi110 bash[13423]: cluster 2021-02-11T15:30:00.001835+0000 mon.a (mon.0) 3849 : cluster [WRN] [WRN] CACHE_POOL_NO_HIT_SET: 1 cache pools are missing hit_sets
2021-02-11T15:30:15.764 INFO:journalctl@ceph.mon.b.smithi110.stdout:Feb 11 15:30:01 smithi110 bash[13423]: cluster 2021-02-11T15:30:00.001854+0000 mon.a (mon.0) 3850 : cluster [WRN]     pool 'test-rados-api-smithi091-35848-11' with cache_mode writeback needs hit_set_type to be set but it is not
2021-02-11T15:30:15.764 INFO:journalctl@ceph.mon.b.smithi110.stdout:Feb 11 15:30:01 smithi110 bash[13423]: cluster 2021-02-11T15:30:00.001874+0000 mon.a (mon.0) 3851 : cluster [WRN] [WRN] POOL_APP_NOT_ENABLED: 1 pool(s) do not have an application enabled
2021-02-11T15:30:15.764 INFO:journalctl@ceph.mon.b.smithi110.stdout:Feb 11 15:30:01 smithi110 bash[13423]: cluster 2021-02-11T15:30:00.001888+0000 mon.a (mon.0) 3852 : cluster [WRN]     application not enabled on pool 'ceph_test_rados_api_asio'
2021-02-11T15:30:15.764 INFO:journalctl@ceph.mon.b.smithi110.stdout:Feb 11 15:30:01 smithi110 bash[13423]: cluster 2021-02-11T15:30:00.001901+0000 mon.a (mon.0) 3853 : cluster [WRN]     use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications

Related issues 2 (0 open2 closed)

Related to RADOS - Bug #39039: mon connection reset, command not resentDuplicateSage Weil

Actions
Related to Messengers - Bug #49237: segv in AsyncConnection::_stop()Resolved

Actions
Actions

Also available in: Atom PDF