Project

General

Profile

Actions

Fix #10363

closed

OSDMonitor setcrushmap tests take a long time on erasure coded rulesets

Added by Loïc Dachary over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
-
% Done:

100%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://workbench.dachary.org/ceph/ceph/blob/giant/src/mon/OSDMonitor.cc#L4007 runs tests by trying to map from min_size to max_size items for each ruleset. The default erasure code ruleset is:

rule erasure-code {
    ruleset 6
    type erasure
    min_size 3
    max_size 20
    step set_chooseleaf_tries 5
    step take default
    step chooseleaf indep 0 type host
    step emit
}

In a cluster with too few OSDs, each attempt to map more OSDs than available will exhaust all retries (50) which turns out to be expensive. In a cluster with 9 OSDs, it takes 5seconds.
$ time crushtool -i /tmp/crushhost --test --show-bad-mappings --rule 6 
user    0m4.921s

Since the test blocks the MON leader, it a few erasure coded rulesets will block the monitor long enough to exceed the timeouts and it will trigger an election.

Actions

Also available in: Atom PDF