Project

General

Profile

Actions

Bug #53855

closed

rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount

Added by Laura Flores over 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
backport_processed
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description: rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/connectivity msgr-failures/many msgr/async-v1only objectstore/bluestore-low-osd-mem-target rados supported-random-distro$/{ubuntu_latest} tasks/rados_api_tests}

Failure Reason:

Command failed (workunit test rados/test.sh) on smithi037 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=12e70511ad0d4669cc9b87f2162909e36a0361eb TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh' 

/a/yuriw-2022-01-11_19:17:55-rados-wip-yuri5-testing-2022-01-11-0843-distro-default-smithi/6608744

2022-01-11T23:19:28.320 INFO:tasks.workunit.client.0.smithi037.stdout:              api_tier_pp: [       OK ] LibRadosTwoPoolsPP.ManifestFlushDupCount (9064 ms)
2022-01-11T23:20:07.194 INFO:tasks.ceph.osd.0.smithi037.stderr:2022-01-11T23:20:07.189+0000 7f37c1598700 -1 osd.0 420 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.5836.0:7699 216.a 216:56439c1b:test-rados-api-smithi037
-22760-88::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected+supports_pool_eio e419)
2022-01-11T23:20:08.162 INFO:tasks.ceph.osd.0.smithi037.stderr:2022-01-11T23:20:08.161+0000 7f37c1598700 -1 osd.0 420 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.5836.0:7699 216.a 216:56439c1b:test-rados-api-smithi037
-22760-88::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected+supports_pool_eio e419)
2022-01-11T23:20:09.181 INFO:tasks.ceph.osd.0.smithi037.stderr:2022-01-11T23:20:09.177+0000 7f37c1598700 -1 osd.0 420 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.5836.0:7699 216.a 216:56439c1b:test-rados-api-smithi037

... more slow ops ...

7-22760-88::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected+supports_pool_eio e419)
2022-01-12T02:11:23.824 INFO:tasks.workunit.client.0.smithi037.stderr:++ cleanup
2022-01-12T02:11:23.824 INFO:tasks.workunit.client.0.smithi037.stderr:++ pkill -P 22546
2022-01-12T02:11:23.845 DEBUG:teuthology.orchestra.run:got remote process result: 124
2022-01-12T02:11:23.850 INFO:tasks.workunit.client.0.smithi037.stderr:++ true
2022-01-12T02:11:23.850 INFO:tasks.workunit.client.0.smithi037.stdout:              api_tier_pp: [ RUN      ] LibRadosTwoPoolsPP.TierFlushDuringFlush
2022-01-12T02:11:23.851 INFO:tasks.workunit:Stopping ['rados/test.sh', 'rados/test_pool_quota.sh'] on client.0...
2022-01-12T02:11:23.851 DEBUG:teuthology.orchestra.run.smithi037:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0

Later on, could be a butterfly-effect of the first error:

2022-01-12T02:26:54.436 INFO:tasks.ceph:pgid 216.a last_scrub_stamp 2022-01-11T23:19:29.235081+0000 time.struct_time(tm_year=2022, tm_mon=1, tm_mday=11, tm_hour=23, tm_min=19, tm_sec=29, tm_wday=1, tm_yday=11, tm_isdst=-1) <= time.struct_
time(tm_year=2022, tm_mon=1, tm_mday=12, tm_hour=2, tm_min=11, tm_sec=25, tm_wday=2, tm_yday=12, tm_isdst=0)
2022-01-12T02:26:54.437 INFO:tasks.ceph:pgid 216.3 last_scrub_stamp 2022-01-11T23:19:29.235081+0000 time.struct_time(tm_year=2022, tm_mon=1, tm_mday=11, tm_hour=23, tm_min=19, tm_sec=29, tm_wday=1, tm_yday=11, tm_isdst=-1) <= time.struct_
time(tm_year=2022, tm_mon=1, tm_mday=12, tm_hour=2, tm_min=11, tm_sec=25, tm_wday=2, tm_yday=12, tm_isdst=0)
2022-01-12T02:26:54.438 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_00e09f05e9fb2ab7c7f497f5e446613531743ae2/teuthology/contextutil.py", line 33, in nested
    yield vars
  File "/home/teuthworker/src/github.com_ceph_ceph-c_12e70511ad0d4669cc9b87f2162909e36a0361eb/qa/tasks/ceph.py", line 1905, in task
    osd_scrub_pgs(ctx, config)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_12e70511ad0d4669cc9b87f2162909e36a0361eb/qa/tasks/ceph.py", line 1299, in osd_scrub_pgs
    raise RuntimeError('Exiting scrub checking -- not all pgs scrubbed.')
RuntimeError: Exiting scrub checking -- not all pgs scrubbed.


Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #55047: quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCountResolvedLaura FloresActions
Copied to RADOS - Backport #56099: pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCountResolvedMyoungwon OhActions
Actions #1

Updated by Laura Flores over 2 years ago

  • Related to Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush added
Actions #2

Updated by Neha Ojha over 2 years ago

  • Assignee set to Myoungwon Oh

Myoungwon Oh: any ideas on this bug?

Actions #4

Updated by Laura Flores about 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 45035
Actions #5

Updated by Laura Flores about 2 years ago

  • Status changed from Fix Under Review to Resolved
Actions #6

Updated by Laura Flores about 2 years ago

  • Related to deleted (Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush)
Actions #7

Updated by Laura Flores about 2 years ago

  • Has duplicate Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush added
Actions #8

Updated by Laura Flores about 2 years ago

/a/yuriw-2022-03-24_01:58:56-rados-wip-yuri7-testing-2022-03-23-1332-quincy-distro-default-smithi/6756757

Actions #9

Updated by Laura Flores about 2 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to quincy
Actions #10

Updated by Backport Bot about 2 years ago

  • Copied to Backport #55047: quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount added
Actions #11

Updated by Aishwarya Mathuria about 2 years ago

/a/yuriw-2022-04-06_16:35:43-rados-wip-yuri5-testing-2022-04-05-1720-distro-default-smithi/6780098

Actions #12

Updated by Laura Flores almost 2 years ago

  • Status changed from Pending Backport to Resolved
Actions #13

Updated by Laura Flores almost 2 years ago

@Myoungwon Oh Oh does this look like the same thing to you? Perhaps your fix needs to be backported to Pacific.

/a/yuriw-2022-06-06_18:49:14-rados-wip-yuri-testing-2022-06-06-1014-pacific-distro-default-smithi/6864415

2022-06-07T03:23:29.593 INFO:tasks.ceph.osd.6.smithi187.stderr:2022-06-07T03:23:29.590+0000 7f4fb0f98700 -1 osd.6 402 get_health_metrics reporting 24 slow ops, oldest is osd_op(client.4979.0:7516 214.14 214:2cd3fa7e:test-rados-api-smithi146-97821-84::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected e402)
2022-06-07T03:23:30.583 INFO:tasks.ceph.osd.6.smithi187.stderr:2022-06-07T03:23:30.581+0000 7f4fb0f98700 -1 osd.6 402 get_health_metrics reporting 24 slow ops, oldest is osd_op(client.4979.0:7516 214.14 214:2cd3fa7e:test-rados-api-smithi146-97821-84::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected e402)
2022-06-07T03:23:31.605 INFO:tasks.ceph.osd.6.smithi187.stderr:2022-06-07T03:23:31.603+0000 7f4fb0f98700 -1 osd.6 402 get_health_metrics reporting 24 slow ops, oldest is osd_op(client.4979.0:7516 214.14 214:2cd3fa7e:test-rados-api-smithi146-97821-84::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected e402)
2022-06-07T03:23:32.597 INFO:tasks.ceph.osd.6.smithi187.stderr:2022-06-07T03:23:32.595+0000 7f4fb0f98700 -1 osd.6 402 get_health_metrics reporting 24 slow ops, oldest is osd_op(client.4979.0:7516 214.14 214:2cd3fa7e:test-rados-api-smithi146-97821-84::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected e402)
2022-06-07T03:23:33.531 INFO:tasks.workunit.client.0.smithi146.stderr:++ cleanup
2022-06-07T03:23:33.532 INFO:tasks.workunit.client.0.smithi146.stderr:++ pkill -P 97643
2022-06-07T03:23:33.591 INFO:tasks.ceph.osd.6.smithi187.stderr:2022-06-07T03:23:33.589+0000 7f4fb0f98700 -1 osd.6 402 get_health_metrics reporting 24 slow ops, oldest is osd_op(client.4979.0:7516 214.14 214:2cd3fa7e:test-rados-api-smithi146-97821-84::foo:head [tier-flush] snapc 0=[] ondisk+read+ignore_cache+known_if_redirected e402)
2022-06-07T03:23:33.628 DEBUG:teuthology.orchestra.run:got remote process result: 124
2022-06-07T03:23:33.646 INFO:tasks.workunit.client.0.smithi146.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh: line 10: 97810 Terminated              bash -o pipefail -exc "ceph_test_rados_$f $color 2>&1 | tee ceph_test_rados_$ff.log | sed \"s/^/$r: /\"" 
2022-06-07T03:23:33.646 INFO:tasks.workunit.client.0.smithi146.stderr:++ true
2022-06-07T03:23:33.646 INFO:tasks.workunit.client.0.smithi146.stdout:              api_tier_pp: [       OK ] LibRadosTwoPoolsPP.CachePin
2022-06-07T03:23:33.647 INFO:tasks.workunit:Stopping ['rados/test.sh'] on client.0...

Actions #14

Updated by Myoungwon Oh almost 2 years ago

ok

Actions #15

Updated by Laura Flores almost 2 years ago

  • Status changed from Resolved to Pending Backport
  • Backport changed from quincy to quincy,pacific
Actions #16

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56099: pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount added
Actions #17

Updated by Laura Flores almost 2 years ago

  • Has duplicate deleted (Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush)
Actions #19

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #20

Updated by Konstantin Shalygin 5 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF