Project

General

Profile

Actions

Bug #59286

open

mon/test_mon_osdmap_prune.sh: test times out after 5+ hours

Added by Laura Flores about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On the initial run, this test ran for almost 4 hours before it timed out:
/a/yuriw-2023-03-14_20:10:47-rados-wip-yuri-testing-2023-03-14-0714-reef-distro-default-smithi/7207185

2023-03-16T23:53:13.229 DEBUG:teuthology.orchestra.run.smithi119:workunit test mon/test_mon_osdmap_prune.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=74ed20c11e5419432cca579d160ba4ccd5f6c09b TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/mon/test_mon_osdmap_prune.sh
...
2023-03-17T03:44:17.194 INFO:tasks.workunit.client.0.smithi119.stderr:2023-03-17T02:52:43.492+0000 7f5161708700  1  Processor -- start
2023-03-17T03:44:17.194 INFO:tasks.workunit.client.0.smithi119.stderr:2023-03-17T02:52:43.492+0000 7f5161708700  1 --  start start
2023-03-17T03:44:17.443 INFO:tasks.workunit.client.0.smithi119.stderr:2023-03-17T02:52:43.492+0000 7f5161708700 10 monclient: init
2023-03-17T03:44:17.951 INFO:tasks.workunit.client.0.smithi119.stderr:2023-03-17T02:52:43.493+0000 7f5161708700 10 monclient: _reopen_session rank -1
2023-03-17T03:44:17.951 INFO:tasks.workunit.client.0.smithi119.stderr:2023-03-17T02:52:43.493+0000 7f5161708700 10 monclient: _add_conns ranks=[0,1,2]
2023-03-17T03:44:17.951 INFO:tasks.workunit.client.0.smithi119.stderr:2023-03-17T02:52:43.493+0000 7f5161708700 10 monclient(hunting): picked mon.noname-a con 0x7f515c071110 addr v1:172.21.15.93:6789/0
2023-03-17T03:44:17.951 INFO:tasks.workunit.client.0.smithi119.stderr:2023-03-17T02:52:43.493+0000 7f5161708700 10 monclient(hunting): picked mon.noname-b con 0x7f515c0a05c0 addr v1:172.21.15.119:6789/0
2023-03-17T03:44:17.951 INFO:tasks.workunit.client.0.smithi119.stderr:2023-03-17T02:52:43.493+0000 7f5161708700 10 monclient(hunting): picked mon.noname-c con 0x7f515c0a3bb0 addr v1:172.21.15.93:6790/0
2023-03-17T03:44:17.951 INFO:tasks.workunit.client.0.smithi119.stderr:2023-03-17T02:52:43.493+0000 7f5161708700  1 --  --> v1:172.21.15.93:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- 0x7f515c071d80 con 0x7f515c071110
2023-03-17T03:44:17.951 INFO:tasks.workunit.client.0.smithi119.stderr:2023-03-17T02:52:43.493+0000 7f5161708700  1 --  --> v1:172.21.15.119:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- 0x7f515c071bf0 con 0x7f515c0a05c0
2023-03-17T03:44:17.952 INFO:tasks.workunit:Stopping ['mon/test_mon_osdmap_prune.sh'] on client.0...
2023-03-17T03:44:19.931 DEBUG:teuthology.orchestra.run.smithi119:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2023-03-17T03:44:20.214 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):

On a rerun of the test, which passed, it took only 45 minutes to complete.
/a/yuriw-2023-03-20_21:06:51-rados-wip-yuri-testing-2023-03-14-0714-reef-distro-default-smithi/7214634

2023-03-22T11:55:06.927 INFO:tasks.workunit:Running workunits matching mon/test_mon_osdmap_prune.sh on client.0...
2023-03-22T11:55:06.929 INFO:tasks.workunit:Running workunit mon/test_mon_osdmap_prune.sh...
2023-03-22T11:55:06.929 DEBUG:teuthology.orchestra.run.smithi161:workunit test mon/test_mon_osdmap_prune.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=74ed20c11e5419432cca579d160ba4ccd5f6c09b TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/mon/test_mon_osdmap_prune.sh
...
2023-03-22T12:40:19.173 INFO:tasks.workunit:Stopping ['mon/test_mon_osdmap_prune.sh'] on client.0...

There are several more examples of this in the Sentry history, i.e. http://pulpito.front.sepia.ceph.com/yuriw-2023-03-21_00:35:27-rados-main-distro-default-smithi/7214703, which have timed out due to the test taking longer than normal to run. This issue coincides with the lab slowness tracked in https://tracker.ceph.com/issues/59127, so I'm thinking it is related to that. The lab slowness has since improved, but let's keep an eye on this.


Related issues 1 (0 open1 closed)

Related to teuthology - Bug #59127: Job that normally complete much sooner last almost 12 hoursCan't reproduce

Actions
Actions #1

Updated by Laura Flores about 1 year ago

  • Related to Bug #59127: Job that normally complete much sooner last almost 12 hours added
Actions

Also available in: Atom PDF