Version 1 - History - Crush extension for more flexible object placement - Ceph - Ceph

1

Jessica Mack

h1. Crush extension for more flexible object placement

2

3

h3. Summary

4

5

Extend crush to allow more flexible object placement

6

7

h3. Owners

8

9

* Li Wang (liwang@ubuntukylin.com)

10

* Lianghao Shen (lianghaoshen@ubuntukylin.com)

11

12

h3. Interested Parties

13

14

* Name (Affiliation)

15

* Name (Affiliation)

16

* Name

17

18

h3. Current Status

19

20

h3. Detailed Description

21

22

This blueprint is originally proposed by Sage at https://wiki.ceph.com/Planning/Blueprints/Dumpling/extend_crush_rule_language.

23

24

CRUSH is a deterministic, rule-based consistent hash-like algorithm (with some very nice properties) for determining object placement in distributed storage systems. Its selection rule language, while already very useful, is incapable of expressing some useful rules:

25

 - The current choose iterates over the 'working set' and recursively selects new items for each item.  It always applies to all items in the working set.  That precludes strategies like "pick 2 racks, choose N from the first, and M from the second".

26

 - It is assumed the hierarchy is a single uniform tree.  You cannot have two parallel trees of devices (say, SSDs and HDDs) in the same nodes, and pick 1 ssd and 1 hdd but ensure that they exist in different hosts.

27

28

Algorithm:

29

Some new features should be developed for these above scenarios:

30

 - Unsymmetric choose option, e.g., _assign(N,TYPE1,TYPE2,a1,a2...an)_, which supports choosing _N TYPE1_ buckets, and then choosing _ai(i=1,2..N) TYPE2_ buckets from the _N TYPE1_ buckets, respectively.

31

32

<pre>

33

assign(n,type1,type2,a1,a2...an) {

34

    map = [a1,a2, ..., an];

35

    out1 = choose_firstn (n, type1, ...);  // choose n items with a type of type1

36

    for items m in out1 {

37

        sub_item = choose_firstn(map[m], type2, ...);

38

        out2 = out2+sub_item;

39

40

41

</pre>

42

43

Osds grouping, _group {devicetype|network}_, which supports identifying the osds with groupids according to the given strategy, namely, by devicetype or network.

44

 - Modified chooseleaf option, _chooseleaf firstn {num} type {bucket-type} [group]_, which can support placing the replicas in different groups, relative to the original one.

45

46

<pre>

47

group {devicetype|network...}

48

    for osds in pool：

49

    if osd.device == ssd

50

        osd.gid = 1;

51

    else

52

        osd.gid = 2;

53

54

choose firstn n type osd [group]

55

    item = crush_bucket_choose(in, x, r);

56

    if(group){

57

        if (is_gid_selected(out,item)){

58

            fgroup++;

59

            retry_group = 1;

60

61

62

</pre>

63

64

h3. Work items

65

66

h4. Coding tasks

67

68

# Task 1

69

# Task 2

70

# Task 3

71

72

h4. Build / release tasks

73

74

# Task 1

75

# Task 2

76

# Task 3

77

78

h4. Documentation tasks

79

80

# Task 1

81

# Task 2

82

# Task 3

83

84

h4. Deprecation tasks

85

86

# Task 1

87

# Task 2

88

# Task 3

Project

General

Profile

Ceph

Crush extension for more flexible object placement » History » Version 1