Bug #53855
closedrados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
100%
Description
Description: rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/connectivity msgr-failures/many msgr/async-v1only objectstore/bluestore-low-osd-mem-target rados supported-random-distro$/{ubuntu_latest} tasks/rados_api_tests}
Failure Reason:
Command failed (workunit test rados/test.sh) on smithi037 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=12e70511ad0d4669cc9b87f2162909e36a0361eb TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh'
/a/yuriw-2022-01-11_19:17:55-rados-wip-yuri5-testing-2022-01-11-0843-distro-default-smithi/6608744
2022-01-11T23:19:28.320 INFO:tasks.workunit.client.0.smithi037.stdout: api_tier_pp: [ OK ] LibRadosTwoPoolsPP.ManifestFlushDupCount (9064 ms)
2022-01-11T23:20:07.194 INFO:tasks.ceph.osd.0.smithi037.stderr:2022-01-11T23:20:07.189+0000 7f37c1598700 -1 osd.0 420 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.5836.0:7699 216.a 216:56439c1b:test-rados-api-smithi037
-22760-88::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected+supports_pool_eio e419)
2022-01-11T23:20:08.162 INFO:tasks.ceph.osd.0.smithi037.stderr:2022-01-11T23:20:08.161+0000 7f37c1598700 -1 osd.0 420 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.5836.0:7699 216.a 216:56439c1b:test-rados-api-smithi037
-22760-88::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected+supports_pool_eio e419)
2022-01-11T23:20:09.181 INFO:tasks.ceph.osd.0.smithi037.stderr:2022-01-11T23:20:09.177+0000 7f37c1598700 -1 osd.0 420 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.5836.0:7699 216.a 216:56439c1b:test-rados-api-smithi037
... more slow ops ...
7-22760-88::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected+supports_pool_eio e419)
2022-01-12T02:11:23.824 INFO:tasks.workunit.client.0.smithi037.stderr:++ cleanup
2022-01-12T02:11:23.824 INFO:tasks.workunit.client.0.smithi037.stderr:++ pkill -P 22546
2022-01-12T02:11:23.845 DEBUG:teuthology.orchestra.run:got remote process result: 124
2022-01-12T02:11:23.850 INFO:tasks.workunit.client.0.smithi037.stderr:++ true
2022-01-12T02:11:23.850 INFO:tasks.workunit.client.0.smithi037.stdout: api_tier_pp: [ RUN ] LibRadosTwoPoolsPP.TierFlushDuringFlush
2022-01-12T02:11:23.851 INFO:tasks.workunit:Stopping ['rados/test.sh', 'rados/test_pool_quota.sh'] on client.0...
2022-01-12T02:11:23.851 DEBUG:teuthology.orchestra.run.smithi037:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
Later on, could be a butterfly-effect of the first error:
2022-01-12T02:26:54.436 INFO:tasks.ceph:pgid 216.a last_scrub_stamp 2022-01-11T23:19:29.235081+0000 time.struct_time(tm_year=2022, tm_mon=1, tm_mday=11, tm_hour=23, tm_min=19, tm_sec=29, tm_wday=1, tm_yday=11, tm_isdst=-1) <= time.struct_
time(tm_year=2022, tm_mon=1, tm_mday=12, tm_hour=2, tm_min=11, tm_sec=25, tm_wday=2, tm_yday=12, tm_isdst=0)
2022-01-12T02:26:54.437 INFO:tasks.ceph:pgid 216.3 last_scrub_stamp 2022-01-11T23:19:29.235081+0000 time.struct_time(tm_year=2022, tm_mon=1, tm_mday=11, tm_hour=23, tm_min=19, tm_sec=29, tm_wday=1, tm_yday=11, tm_isdst=-1) <= time.struct_
time(tm_year=2022, tm_mon=1, tm_mday=12, tm_hour=2, tm_min=11, tm_sec=25, tm_wday=2, tm_yday=12, tm_isdst=0)
2022-01-12T02:26:54.438 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_00e09f05e9fb2ab7c7f497f5e446613531743ae2/teuthology/contextutil.py", line 33, in nested
yield vars
File "/home/teuthworker/src/github.com_ceph_ceph-c_12e70511ad0d4669cc9b87f2162909e36a0361eb/qa/tasks/ceph.py", line 1905, in task
osd_scrub_pgs(ctx, config)
File "/home/teuthworker/src/github.com_ceph_ceph-c_12e70511ad0d4669cc9b87f2162909e36a0361eb/qa/tasks/ceph.py", line 1299, in osd_scrub_pgs
raise RuntimeError('Exiting scrub checking -- not all pgs scrubbed.')
RuntimeError: Exiting scrub checking -- not all pgs scrubbed.
Updated by Laura Flores over 2 years ago
- Related to Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush added
Updated by Neha Ojha over 2 years ago
- Assignee set to Myoungwon Oh
Myoungwon Oh: any ideas on this bug?
Updated by Myoungwon Oh about 2 years ago
Updated by Laura Flores about 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 45035
Updated by Laura Flores about 2 years ago
- Status changed from Fix Under Review to Resolved
Updated by Laura Flores about 2 years ago
- Related to deleted (Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush)
Updated by Laura Flores about 2 years ago
- Has duplicate Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush added
Updated by Laura Flores about 2 years ago
/a/yuriw-2022-03-24_01:58:56-rados-wip-yuri7-testing-2022-03-23-1332-quincy-distro-default-smithi/6756757
Updated by Laura Flores about 2 years ago
- Status changed from Resolved to Pending Backport
- Backport set to quincy
Updated by Backport Bot about 2 years ago
- Copied to Backport #55047: quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount added
Updated by Aishwarya Mathuria about 2 years ago
/a/yuriw-2022-04-06_16:35:43-rados-wip-yuri5-testing-2022-04-05-1720-distro-default-smithi/6780098
Updated by Laura Flores almost 2 years ago
- Status changed from Pending Backport to Resolved
Updated by Laura Flores almost 2 years ago
@Myoungwon Oh Oh does this look like the same thing to you? Perhaps your fix needs to be backported to Pacific.
/a/yuriw-2022-06-06_18:49:14-rados-wip-yuri-testing-2022-06-06-1014-pacific-distro-default-smithi/6864415
2022-06-07T03:23:29.593 INFO:tasks.ceph.osd.6.smithi187.stderr:2022-06-07T03:23:29.590+0000 7f4fb0f98700 -1 osd.6 402 get_health_metrics reporting 24 slow ops, oldest is osd_op(client.4979.0:7516 214.14 214:2cd3fa7e:test-rados-api-smithi146-97821-84::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected e402)
2022-06-07T03:23:30.583 INFO:tasks.ceph.osd.6.smithi187.stderr:2022-06-07T03:23:30.581+0000 7f4fb0f98700 -1 osd.6 402 get_health_metrics reporting 24 slow ops, oldest is osd_op(client.4979.0:7516 214.14 214:2cd3fa7e:test-rados-api-smithi146-97821-84::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected e402)
2022-06-07T03:23:31.605 INFO:tasks.ceph.osd.6.smithi187.stderr:2022-06-07T03:23:31.603+0000 7f4fb0f98700 -1 osd.6 402 get_health_metrics reporting 24 slow ops, oldest is osd_op(client.4979.0:7516 214.14 214:2cd3fa7e:test-rados-api-smithi146-97821-84::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected e402)
2022-06-07T03:23:32.597 INFO:tasks.ceph.osd.6.smithi187.stderr:2022-06-07T03:23:32.595+0000 7f4fb0f98700 -1 osd.6 402 get_health_metrics reporting 24 slow ops, oldest is osd_op(client.4979.0:7516 214.14 214:2cd3fa7e:test-rados-api-smithi146-97821-84::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected e402)
2022-06-07T03:23:33.531 INFO:tasks.workunit.client.0.smithi146.stderr:++ cleanup
2022-06-07T03:23:33.532 INFO:tasks.workunit.client.0.smithi146.stderr:++ pkill -P 97643
2022-06-07T03:23:33.591 INFO:tasks.ceph.osd.6.smithi187.stderr:2022-06-07T03:23:33.589+0000 7f4fb0f98700 -1 osd.6 402 get_health_metrics reporting 24 slow ops, oldest is osd_op(client.4979.0:7516 214.14 214:2cd3fa7e:test-rados-api-smithi146-97821-84::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected e402)
2022-06-07T03:23:33.628 DEBUG:teuthology.orchestra.run:got remote process result: 124
2022-06-07T03:23:33.646 INFO:tasks.workunit.client.0.smithi146.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh: line 10: 97810 Terminated bash -o pipefail -exc "ceph_test_rados_$f $color 2>&1 | tee ceph_test_rados_$ff.log | sed \"s/^/$r: /\""
2022-06-07T03:23:33.646 INFO:tasks.workunit.client.0.smithi146.stderr:++ true
2022-06-07T03:23:33.646 INFO:tasks.workunit.client.0.smithi146.stdout: api_tier_pp: [ OK ] LibRadosTwoPoolsPP.CachePin
2022-06-07T03:23:33.647 INFO:tasks.workunit:Stopping ['rados/test.sh'] on client.0...
Updated by Laura Flores almost 2 years ago
- Status changed from Resolved to Pending Backport
- Backport changed from quincy to quincy,pacific
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56099: pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount added
Updated by Laura Flores almost 2 years ago
- Has duplicate deleted (Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush)
Updated by Myoungwon Oh almost 2 years ago
Updated by Konstantin Shalygin 5 months ago
- Status changed from Pending Backport to Resolved
- % Done changed from 0 to 100