Actions
Bug #13592
closedceph-helpers: TEST_auto_repair_erasure_coded intermittent failures
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
./test/osd/osd-scrub-repair.sh:182: TEST_auto_repair_erasure_coded: objectstore_tool testdir/osd-scrub-repair 0 SOMETHING list-attrs ../qa/workunits/ceph-helpers.sh:792: objectstore_tool: local dir=testdir/osd-scrub-repair ../qa/workunits/ceph-helpers.sh:793: objectstore_tool: shift ../qa/workunits/ceph-helpers.sh:794: objectstore_tool: local id=0 ../qa/workunits/ceph-helpers.sh:795: objectstore_tool: shift ../qa/workunits/ceph-helpers.sh:796: objectstore_tool: local osd_data=testdir/osd-scrub-repair/0 ../qa/workunits/ceph-helpers.sh:798: objectstore_tool: kill_daemons testdir/osd-scrub-repair TERM osd.0 .../qa/workunits/ceph-helpers.sh:192: kill_daemons: shopt -q -o xtrace .../qa/workunits/ceph-helpers.sh:192: kill_daemons: echo true ../qa/workunits/ceph-helpers.sh:192: kill_daemons: local trace=true ../qa/workunits/ceph-helpers.sh:193: kill_daemons: true ../qa/workunits/ceph-helpers.sh:193: kill_daemons: shopt -u -o xtrace ../qa/workunits/ceph-helpers.sh:219: kill_daemons: return 0 ../qa/workunits/ceph-helpers.sh:800: objectstore_tool: ceph-objectstore-tool --data-path testdir/osd-scrub-repair/0 --journal-path testdir/osd-scrub-repair/0/journal SOMETHING list-attrs No object id 'SOMETHING' found ../qa/workunits/ceph-helpers.sh:802: objectstore_tool: return 1
Files
Updated by Loïc Dachary over 8 years ago
- Status changed from New to Need More Info
Updated by Xinze Chi over 8 years ago
I think the reason is that the scrub is not be scheduled by osd.
Because the sched_scrub is only 33% percent called in every tick (OSD::scrub_random_backoff()). And then it may need more time to wait for scheduling the specially pg scrub(we should loop through the all pgs)
Updated by Xinze Chi over 8 years ago
I think we could schedule the scrub by manually (using the ceph pg scrub command)
Updated by Xinze Chi over 8 years ago
- Remove the object from one shard physically
objectstore_tool $dir $(get_not_primary $poolname SOMETHING) SOMETHING remove || return 1
- Give some time for auto repair
sleep 20
- 20s may not be enough to schedule the scrub.
maybe be we could wait until the scrub stamp changed (we could also use timeout strategy which is more than 20s) ?
Updated by Loïc Dachary over 8 years ago
- Status changed from Need More Info to Fix Under Review
Updated by Loïc Dachary over 8 years ago
- Status changed from Fix Under Review to Resolved
Actions