Bug #8193
closedHitSetTrim test in test/librados/tier.cc needs to be skipped if thrasher running
0%
Description
Command failed on 10.214.131.16 with status 1: 'mkdir p -
/home/ubuntu/cephtest/mnt.0/client.0/tmp && cd --
/home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1
CEPH_REF=623014623851a4df10e6412380823ca68cf72d5b
TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" adjust-ulimits ceph-coverage
/home/ubuntu/cephtest/archive/coverage timeout 3h
/home/ubuntu/cephtest/workunit.client.0/rados/test.sh'
ubuntu@teuthology:/a/sage-2014-04-22_09:16:16-rados-firefly-testing-basic-plana/209325
Updated by David Zafman about 10 years ago
2014-04-22T16:26:11.096 INFO:teuthology.task.workunit.client.0.out:[10.214.131.16]: [ RUN ] LibRadosTierECPP.HitSetTrim
Lots of in and out of recovery and primary changes for pg 86.6. Finally, things might have made progress after finding missing test object "foo" and archives and going active+clean, but time ran out.
2014-04-22 16:28:00.922829 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=4] on_local_recover: 7fc1f406/foo/head//86
2014-04-22 16:28:00.922846 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=4] got missing 7fc1f406/foo/head//86 v 764'272
2014-04-22 16:28:00.932736 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=3] on_local_recover: 6/hit_set_86.6_archive_2014-04-22 16:26:34.669846_2014-04-22 16:26:38.255763/head/.ceph-internal/86
2014-04-22 16:28:00.932770 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=3] got missing 6/hit_set_86.6 archive_2014-04-22 16:26:34.669846_2014-04-22 16:26:38.255763/head/.ceph-internal/86 v 754'264
2014-04-22 16:28:00.932896 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=2] on_local_recover: 6/hit_set_86.6_archive_2014-04-22 16:27:10.746643_2014-04-22 16:27:14.660239/head/.ceph-internal/86
2014-04-22 16:28:00.932916 7f3e4a65a700 10 osd.5 pg_epoch: 781 pg[86.6s1( v 764'272 lc 679'1 (0'0,764'272] local-les=778 n=4 ec=675 les/c 778/764 776/776/776) [2147483647,5,0] r=1 lpr=778 pi=763-775/3 luod=0'0 crt=764'272 lcod 0'0 active m=2] got missing 6/hit_set_86.6 archive_2014-04-22 16:27:10.746643_2014-04-22 16:27:14.660239/head/.ceph-internal/86 v 764'269
2014-04-22 16:28:03.450236 7f3e4a65a700 10 osd.5 pg_epoch: 783 pg[86.6s1( v 783'274 (0'0,783'274] local-les=783 n=4 ec=675 les/c 783/783 782/782/782) [2147483647,5,0] r=1 lpr=782 pi=763-781/4 luod=781'273 crt=764'272 lcod 781'273 mlcod 0'0 active+clean] trim_past_intervals: trimming interval(763-764 [3,2,0]/[3,2,0] maybe_went_rw)
2014-04-22T16:28:03.468 INFO:teuthology.task.workunit.client.0.out:[10.214.131.16]: test/librados/tier.cc:4128: Failure
Updated by David Zafman about 10 years ago
- Project changed from Ceph to teuthology
- Subject changed from hitset trim fail to HitSetTrim test in test/librados/tier.cc needs to be skipped if thrasher running
- Assignee deleted (
David Zafman)
This particular test case is timing sensitive. It doesn't make sense to run it when the thrasher is running. This may require a new mechanism in order to skip this test.
Updated by David Zafman about 10 years ago
I should have mentioned that there are 2 HitSetTrim tests as is typical.
Updated by Sage Weil about 10 years ago
- Status changed from New to Fix Under Review
Updated by Anonymous about 10 years ago
- Status changed from Fix Under Review to Resolved
Request pulled
Updated by Samuel Just almost 9 years ago
- Project changed from teuthology to Ceph
- Status changed from Resolved to 12
- Regression set to No
Heh, that PR isn't actually enough. The number of hitsets can still exceed count while backfilling. I'll just remove that assert and leave the time limit assert.
Updated by Samuel Just almost 9 years ago
- Assignee changed from Anonymous to Samuel Just