Bug #8193
HitSetTrim test in test/librados/tier.cc needs to be skipped if thrasher running
0%
Description
Command failed on 10.214.131.16 with status 1: 'mkdir p -
/home/ubuntu/cephtest/mnt.0/client.0/tmp && cd --
/home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1
CEPH_REF=623014623851a4df10e6412380823ca68cf72d5b
TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" adjust-ulimits ceph-coverage
/home/ubuntu/cephtest/archive/coverage timeout 3h
/home/ubuntu/cephtest/workunit.client.0/rados/test.sh'
ubuntu@teuthology:/a/sage-2014-04-22_09:16:16-rados-firefly-testing-basic-plana/209325
Associated revisions
ceph_test_rados_api_tier: increase HitSetTrim timeouts
...so that they pass when they get unlucky with thrashing.
This will vastly decrease the probability of failure, but failure will
always be possible when a timeout is in place.
Fixes: #8193
Signed-off-by: Sage Weil <sage@inktank.com>
History
#1 Updated by David Zafman almost 10 years ago
- Assignee set to David Zafman
#2 Updated by David Zafman almost 10 years ago
2014-04-22T16:26:11.096 INFO:teuthology.task.workunit.client.0.out:[10.214.131.16]: [ RUN ] LibRadosTierECPP.HitSetTrim
Lots of in and out of recovery and primary changes for pg 86.6. Finally, things might have made progress after finding missing test object "foo" and archives and going active+clean, but time ran out.
2014-04-22 16:28:00.922829 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=4] on_local_recover: 7fc1f406/foo/head//86
2014-04-22 16:28:00.922846 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=4] got missing 7fc1f406/foo/head//86 v 764'272
2014-04-22 16:28:00.932736 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=3] on_local_recover: 6/hit_set_86.6_archive_2014-04-22 16:26:34.669846_2014-04-22 16:26:38.255763/head/.ceph-internal/86
2014-04-22 16:28:00.932770 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=3] got missing 6/hit_set_86.6 archive_2014-04-22 16:26:34.669846_2014-04-22 16:26:38.255763/head/.ceph-internal/86 v 754'264
2014-04-22 16:28:00.932896 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=2] on_local_recover: 6/hit_set_86.6_archive_2014-04-22 16:27:10.746643_2014-04-22 16:27:14.660239/head/.ceph-internal/86
2014-04-22 16:28:00.932916 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=2] got missing 6/hit_set_86.6 archive_2014-04-22 16:27:10.746643_2014-04-22 16:27:14.660239/head/.ceph-internal/86 v 764'269
2014-04-22 16:28:03.450236 7f3e4a65a700 10 osd.5 pg_epoch: 783 pg[86.6s1( v 783'274 (0'0,783'274] local-les=783 n=4 ec=675 les/c 783/783 782/782/782) [2147483647,5,0] r=1 lpr=782 pi=763-781/4 luod=781'273 crt=764'272 lcod 781'273 mlcod 0'0 active+clean] trim_past_intervals: trimming interval(763-764 [3,2,0]/[3,2,0] maybe_went_rw)
2014-04-22T16:28:03.468 INFO:teuthology.task.workunit.client.0.out:[10.214.131.16]: test/librados/tier.cc:4128: Failure
#3 Updated by David Zafman almost 10 years ago
- Project changed from Ceph to teuthology
- Subject changed from hitset trim fail to HitSetTrim test in test/librados/tier.cc needs to be skipped if thrasher running
- Assignee deleted (
David Zafman)
This particular test case is timing sensitive. It doesn't make sense to run it when the thrasher is running. This may require a new mechanism in order to skip this test.
#4 Updated by Ian Colle almost 10 years ago
- Assignee set to Anonymous
#5 Updated by David Zafman almost 10 years ago
I should have mentioned that there are 2 HitSetTrim tests as is typical.
#6 Updated by Sage Weil almost 10 years ago
- Status changed from New to Fix Under Review
#7 Updated by Anonymous almost 10 years ago
- Status changed from Fix Under Review to Resolved
Request pulled
#8 Updated by Samuel Just over 8 years ago
- Project changed from teuthology to Ceph
- Status changed from Resolved to 12
- Regression set to No
Heh, that PR isn't actually enough. The number of hitsets can still exceed count while backfilling. I'll just remove that assert and leave the time limit assert.
#9 Updated by Samuel Just over 8 years ago
- Assignee changed from Anonymous to Samuel Just
#10 Updated by Samuel Just over 8 years ago
- Status changed from 12 to Resolved