Project

General

Profile

Crush extension for more flexible object placement » History » Version 1

Jessica Mack, 06/23/2015 02:03 AM

1 1 Jessica Mack
h1. Crush extension for more flexible object placement
2
3
h3. Summary
4
5
Extend crush to allow more flexible object placement
6
7
h3. Owners
8
9
* Li Wang (liwang@ubuntukylin.com)
10
* Lianghao Shen (lianghaoshen@ubuntukylin.com)
11
 
12
h3. Interested Parties
13
14
* Name (Affiliation)
15
* Name (Affiliation)
16
* Name
17
18
h3. Current Status
19
 
20
h3. Detailed Description
21
22
This blueprint is originally proposed by Sage at https://wiki.ceph.com/Planning/Blueprints/Dumpling/extend_crush_rule_language.
23
 
24
CRUSH is a deterministic, rule-based consistent hash-like algorithm (with some very nice properties) for determining object placement in distributed storage systems. Its selection rule language, while already very useful, is incapable of expressing some useful rules:
25
 - The current choose iterates over the 'working set' and recursively selects new items for each item.  It always applies to all items in the working set.  That precludes strategies like "pick 2 racks, choose N from the first, and M from the second".
26
 - It is assumed the hierarchy is a single uniform tree.  You cannot have two parallel trees of devices (say, SSDs and HDDs) in the same nodes, and pick 1 ssd and 1 hdd but ensure that they exist in different hosts.
27
28
Algorithm:
29
Some new features should be developed for these above scenarios:
30
 - Unsymmetric choose option, e.g., _assign(N,TYPE1,TYPE2,a1,a2...an)_, which supports choosing _N TYPE1_ buckets, and then choosing _ai(i=1,2..N) TYPE2_ buckets from the _N TYPE1_ buckets, respectively.
31
 
32
<pre>
33
assign(n,type1,type2,a1,a2...an) {
34
    map = [a1,a2, ..., an];
35
    out1 = choose_firstn (n, type1, ...);  // choose n items with a type of type1
36
    for items m in out1 {
37
        sub_item = choose_firstn(map[m], type2, ...);
38
        out2 = out2+sub_item;
39
      }
40
}
41
</pre>
42
 
43
Osds grouping, _group {devicetype|network}_, which supports identifying the osds with groupids according to the given strategy, namely, by devicetype or network.
44
 - Modified chooseleaf option, _chooseleaf firstn {num} type {bucket-type} [group]_, which can support placing the replicas in different groups, relative to the original one.
45
46
<pre>
47
group {devicetype|network...}
48
    for osds in pool:
49
    if osd.device == ssd
50
        osd.gid = 1;
51
    else
52
        osd.gid = 2;
53
54
choose firstn n type osd [group]   
55
    item = crush_bucket_choose(in, x, r);
56
    if(group){
57
        if (is_gid_selected(out,item)){
58
            fgroup++;       
59
            retry_group = 1;
60
        }
61
    }
62
</pre>
63
64
h3. Work items
65
66
h4. Coding tasks
67
68
# Task 1
69
# Task 2
70
# Task 3
71
72
h4. Build / release tasks
73
74
# Task 1
75
# Task 2
76
# Task 3
77
78
h4. Documentation tasks
79
80
# Task 1
81
# Task 2
82
# Task 3
83
84
h4. Deprecation tasks
85
86
# Task 1
87
# Task 2
88
# Task 3