Project

General

Profile

Actions

Bug #53855

closed

rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount

Added by Laura Flores over 2 years ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
backport_processed
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description: rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/connectivity msgr-failures/many msgr/async-v1only objectstore/bluestore-low-osd-mem-target rados supported-random-distro$/{ubuntu_latest} tasks/rados_api_tests}

Failure Reason:

Command failed (workunit test rados/test.sh) on smithi037 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=12e70511ad0d4669cc9b87f2162909e36a0361eb TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh' 

/a/yuriw-2022-01-11_19:17:55-rados-wip-yuri5-testing-2022-01-11-0843-distro-default-smithi/6608744

2022-01-11T23:19:28.320 INFO:tasks.workunit.client.0.smithi037.stdout:              api_tier_pp: [       OK ] LibRadosTwoPoolsPP.ManifestFlushDupCount (9064 ms)
2022-01-11T23:20:07.194 INFO:tasks.ceph.osd.0.smithi037.stderr:2022-01-11T23:20:07.189+0000 7f37c1598700 -1 osd.0 420 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.5836.0:7699 216.a 216:56439c1b:test-rados-api-smithi037
-22760-88::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected+supports_pool_eio e419)
2022-01-11T23:20:08.162 INFO:tasks.ceph.osd.0.smithi037.stderr:2022-01-11T23:20:08.161+0000 7f37c1598700 -1 osd.0 420 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.5836.0:7699 216.a 216:56439c1b:test-rados-api-smithi037
-22760-88::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected+supports_pool_eio e419)
2022-01-11T23:20:09.181 INFO:tasks.ceph.osd.0.smithi037.stderr:2022-01-11T23:20:09.177+0000 7f37c1598700 -1 osd.0 420 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.5836.0:7699 216.a 216:56439c1b:test-rados-api-smithi037

... more slow ops ...

7-22760-88::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected+supports_pool_eio e419)
2022-01-12T02:11:23.824 INFO:tasks.workunit.client.0.smithi037.stderr:++ cleanup
2022-01-12T02:11:23.824 INFO:tasks.workunit.client.0.smithi037.stderr:++ pkill -P 22546
2022-01-12T02:11:23.845 DEBUG:teuthology.orchestra.run:got remote process result: 124
2022-01-12T02:11:23.850 INFO:tasks.workunit.client.0.smithi037.stderr:++ true
2022-01-12T02:11:23.850 INFO:tasks.workunit.client.0.smithi037.stdout:              api_tier_pp: [ RUN      ] LibRadosTwoPoolsPP.TierFlushDuringFlush
2022-01-12T02:11:23.851 INFO:tasks.workunit:Stopping ['rados/test.sh', 'rados/test_pool_quota.sh'] on client.0...
2022-01-12T02:11:23.851 DEBUG:teuthology.orchestra.run.smithi037:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0

Later on, could be a butterfly-effect of the first error:

2022-01-12T02:26:54.436 INFO:tasks.ceph:pgid 216.a last_scrub_stamp 2022-01-11T23:19:29.235081+0000 time.struct_time(tm_year=2022, tm_mon=1, tm_mday=11, tm_hour=23, tm_min=19, tm_sec=29, tm_wday=1, tm_yday=11, tm_isdst=-1) <= time.struct_
time(tm_year=2022, tm_mon=1, tm_mday=12, tm_hour=2, tm_min=11, tm_sec=25, tm_wday=2, tm_yday=12, tm_isdst=0)
2022-01-12T02:26:54.437 INFO:tasks.ceph:pgid 216.3 last_scrub_stamp 2022-01-11T23:19:29.235081+0000 time.struct_time(tm_year=2022, tm_mon=1, tm_mday=11, tm_hour=23, tm_min=19, tm_sec=29, tm_wday=1, tm_yday=11, tm_isdst=-1) <= time.struct_
time(tm_year=2022, tm_mon=1, tm_mday=12, tm_hour=2, tm_min=11, tm_sec=25, tm_wday=2, tm_yday=12, tm_isdst=0)
2022-01-12T02:26:54.438 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_00e09f05e9fb2ab7c7f497f5e446613531743ae2/teuthology/contextutil.py", line 33, in nested
    yield vars
  File "/home/teuthworker/src/github.com_ceph_ceph-c_12e70511ad0d4669cc9b87f2162909e36a0361eb/qa/tasks/ceph.py", line 1905, in task
    osd_scrub_pgs(ctx, config)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_12e70511ad0d4669cc9b87f2162909e36a0361eb/qa/tasks/ceph.py", line 1299, in osd_scrub_pgs
    raise RuntimeError('Exiting scrub checking -- not all pgs scrubbed.')
RuntimeError: Exiting scrub checking -- not all pgs scrubbed.


Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #55047: quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCountResolvedLaura FloresActions
Copied to RADOS - Backport #56099: pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCountResolvedMyoungwon OhActions
Actions

Also available in: Atom PDF