Actions
Fix #10363
closedOSDMonitor setcrushmap tests take a long time on erasure coded rulesets
% Done:
100%
Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
http://workbench.dachary.org/ceph/ceph/blob/giant/src/mon/OSDMonitor.cc#L4007 runs tests by trying to map from min_size to max_size items for each ruleset. The default erasure code ruleset is:
rule erasure-code { ruleset 6 type erasure min_size 3 max_size 20 step set_chooseleaf_tries 5 step take default step chooseleaf indep 0 type host step emit }
In a cluster with too few OSDs, each attempt to map more OSDs than available will exhaust all retries (50) which turns out to be expensive. In a cluster with 9 OSDs, it takes 5seconds.
$ time crushtool -i /tmp/crushhost --test --show-bad-mappings --rule 6 user 0m4.921s
Since the test blocks the MON leader, it a few erasure coded rulesets will block the monitor long enough to exceed the timeouts and it will trigger an election.
Updated by Loïc Dachary over 9 years ago
- Tracker changed from Bug to Fix
- Assignee set to Loïc Dachary
Updated by Yann Dupont over 9 years ago
confirmed. Things can go even worse when you're setting non-default retries on some rules (that is : step set_choose_tries 200 ). This can lead to an election storm between monitors.
Updated by Loïc Dachary over 9 years ago
- Status changed from 12 to Fix Under Review
Updated by Loïc Dachary about 9 years ago
- Status changed from Fix Under Review to Resolved
- % Done changed from 0 to 100
- Backport deleted (
firefly,giant)
removing the backport : this really is an optimization that does not qualify for backports. It probably would if people were complaining about it but it does not seem to be the case.
Actions