Project

General

Profile

Bug #8193

HitSetTrim test in test/librados/tier.cc needs to be skipped if thrasher running

Added by Samuel Just almost 10 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Command failed on 10.214.131.16 with status 1: 'mkdir p -
/home/ubuntu/cephtest/mnt.0/client.0/tmp && cd --
/home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1
CEPH_REF=623014623851a4df10e6412380823ca68cf72d5b
TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" adjust-ulimits ceph-coverage
/home/ubuntu/cephtest/archive/coverage timeout 3h
/home/ubuntu/cephtest/workunit.client.0/rados/test.sh'
ubuntu@teuthology:/a/sage-2014-04-22_09:16:16-rados-firefly-testing-basic-plana/209325

Associated revisions

Revision d0f1806d (diff)
Added by Sage Weil almost 10 years ago

ceph_test_rados_api_tier: increase HitSetTrim timeouts

...so that they pass when they get unlucky with thrashing.

This will vastly decrease the probability of failure, but failure will
always be possible when a timeout is in place.

Fixes: #8193
Signed-off-by: Sage Weil <>

History

#1 Updated by David Zafman almost 10 years ago

  • Assignee set to David Zafman

#2 Updated by David Zafman almost 10 years ago

2014-04-22T16:26:11.096 INFO:teuthology.task.workunit.client.0.out:[10.214.131.16]: [ RUN ] LibRadosTierECPP.HitSetTrim

Lots of in and out of recovery and primary changes for pg 86.6. Finally, things might have made progress after finding missing test object "foo" and archives and going active+clean, but time ran out.

2014-04-22 16:28:00.922829 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=4] on_local_recover: 7fc1f406/foo/head//86
2014-04-22 16:28:00.922846 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=4] got missing 7fc1f406/foo/head//86 v 764'272

2014-04-22 16:28:00.932736 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=3] on_local_recover: 6/hit_set_86.6_archive_2014-04-22 16:26:34.669846_2014-04-22 16:26:38.255763/head/.ceph-internal/86
2014-04-22 16:28:00.932770 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=3] got missing 6/hit_set_86.6 archive_2014-04-22 16:26:34.669846_2014-04-22 16:26:38.255763/head/.ceph-internal/86 v 754'264
2014-04-22 16:28:00.932896 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=2] on_local_recover: 6/hit_set_86.6_archive_2014-04-22 16:27:10.746643_2014-04-22 16:27:14.660239/head/.ceph-internal/86
2014-04-22 16:28:00.932916 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=2] got missing 6/hit_set_86.6 archive_2014-04-22 16:27:10.746643_2014-04-22 16:27:14.660239/head/.ceph-internal/86 v 764'269

2014-04-22 16:28:03.450236 7f3e4a65a700 10 osd.5 pg_epoch: 783 pg[86.6s1( v 783'274 (0'0,783'274] local-les=783 n=4 ec=675 les/c 783/783 782/782/782) [2147483647,5,0] r=1 lpr=782 pi=763-781/4 luod=781'273 crt=764'272 lcod 781'273 mlcod 0'0 active+clean] trim_past_intervals: trimming interval(763-764 [3,2,0]/[3,2,0] maybe_went_rw)

2014-04-22T16:28:03.468 INFO:teuthology.task.workunit.client.0.out:[10.214.131.16]: test/librados/tier.cc:4128: Failure

#3 Updated by David Zafman almost 10 years ago

  • Project changed from Ceph to teuthology
  • Subject changed from hitset trim fail to HitSetTrim test in test/librados/tier.cc needs to be skipped if thrasher running
  • Assignee deleted (David Zafman)

This particular test case is timing sensitive. It doesn't make sense to run it when the thrasher is running. This may require a new mechanism in order to skip this test.

#4 Updated by Ian Colle almost 10 years ago

  • Assignee set to Anonymous

#5 Updated by David Zafman almost 10 years ago

I should have mentioned that there are 2 HitSetTrim tests as is typical.

#6 Updated by Sage Weil almost 10 years ago

  • Status changed from New to Fix Under Review

#7 Updated by Anonymous almost 10 years ago

  • Status changed from Fix Under Review to Resolved

Request pulled

#8 Updated by Samuel Just over 8 years ago

  • Project changed from teuthology to Ceph
  • Status changed from Resolved to 12
  • Regression set to No

Heh, that PR isn't actually enough. The number of hitsets can still exceed count while backfilling. I'll just remove that assert and leave the time limit assert.

#9 Updated by Samuel Just over 8 years ago

  • Assignee changed from Anonymous to Samuel Just

#10 Updated by Samuel Just over 8 years ago

  • Status changed from 12 to Resolved

Also available in: Atom PDF