Version 1 - History - Osd - Tiering II (Warm->Cold) - Ceph - Ceph

1

Jessica Mack

h1. Osd - Tiering II (Warm->Cold)

2

3

h3. Summary

4

5

Cache/Tiering can be useful in some cases where the placing the slower storage on another rados tier in the same cluster is beneficial.  However, there are use cases where even that would be too expensive.  For those cases, it might be useful for ceph to to be able to demote sufficiently frosty objects to some other service using a plugin abstraction.

6

7

h3. Owners

8

9

* Sam Just (RedHat)

10

* Name (Affiliation)

11

* Name

12

13

h3. Interested Parties

14

15

* Loic Dachary (Red Hat)

16

* Name (Affiliation)

17

* Name

18

19

h3. Current Status

20

21

22

h3. Detailed Description

23

24

The idea is to extend the current cache/tiering infrastructure in a few ways:

25

1) Add a new flavor of hot tier which instead of using lack of an object to indicate that it needs to check the next lower tier uses explicit link objects to redirect to the lower tier.  Thus, the hotter tier acts as a metadata cache of sorts.

26

2) Create an abstraction for the lower tier machinery into which can be plugged an implementation.

27

28

A first zero thought crack at an interface (any actual version would be in C):

29

30

<pre>

31

32

class TierImplementation {

33

class ObjectStream {

34

public:

35

/**

36

 * get more bytes from the object

37

38

 * @param max [in] max bytes to get

39

 * @param out [out] next bytes of the object

40

 * @return 0 on success, -error on error, -ERANGE when complete

41

*/

42

int get_bytes(size_t max, bufferlist *out) = 0;

43

44

/**

45

 * get more bytes from the omap

46

47

 * @param max [in] max bytes to get

48

 * @param out [out] map<string, bufferlist> encoding of next keys

49

 * @return 0 on success, -error on error, -ERANGE when complete

50

*/

51

int get_omap(size_t max, bufferlist *out) = 0;

52

53

/**

54

 * get xattrs

55

56

 * @param attrs [out] attrs

57

 * @return 0 on success, -error on error

58

*/

59

int get_xattrs(map<string, bufferlist> *out) = 0;

60

};

61

typedef uint64_t write_handle_t;

62

typedef uint64_t read_handle_t

63

typedef string backend_object_id_t;

64

static const char *get_implementation_type() = 0;

65

66

/**

67

 * Begin object creation operation

68

69

 * @param [out] string backend id -- json -- globally unique

70

 * @return demotion operation handle

71

*/

72

write_handle_t open_object(backend_object_id_t *out) = 0;

73

74

/**

75

 * Append bytes to open object

76

77

 * @param [in] write operation handle

78

 * @param [in] bufferlist bytes to append

79

 * @param [in] transfers ownership, callback on completion

80

 * @return -error, 0 on success

81

*/

82

int append_bytes(write_handle_t id, bufferlist bytes, Context *c) = 0;

83

84

/**

85

 * Append omap keys

86

87

 * @param [in] write operation handle

88

 * @param [in] bufferlist map<string, bufferlist> encoding keys to add

89

 * @param [in] transfers ownership, callback on completion

90

 * @return -error, 0 on success

91

*/

92

int append_omap(write_handle_t id, bufferlist bytes, Context *c) = 0;

93

94

/**

95

 * Add xattrs and close object

96

97

 * @param [in] write handle to close

98

 * @param [in] xattrs for object

99

 * @param [in] transers ownership, callback on completion

100

 * @return -error, 0 on success

101

*/

102

int close_object(write_handle_t id, const map<string, bufferlist> &attrs,

103

                 Context *c) = 0;

104

/**

105

 * remove specified object

106

107

 * @param id [in] object to remove

108

 * @param [in] transers ownership, callback on completion

109

 * @return -error, 0 on success

110

*/

111

int remove_object(const backend_object_id_t &id, Context *c) = 0;

112

113

/**

114

 * open object for reading

115

116

 * @param [in] backend id

117

 * @return read operation id

118

*/

119

read_handle_t open_read_operation(backend_object_id_t id) = 0;

120

121

/**

122

 * read bytes

123

124

 * @param [in] read operation id

125

 * @param [in] max to read

126

 * @param [out] target for read bytes

127

 * @param [in] transers ownership, callback on completion

128

 * @return -error, 0 on success, -ERANGE when complete

129

*/

130

int read_bytes(

131

    read_handle_t id, size_t amount, bufferlist *out, Context *c) = 0;

132

133

134

/**

135

 * read omap

136

137

 * @param [in] read operation id

138

 * @param [in] max to read

139

 * @param [out] target for read bytes, map<string, bufferlist> encoding

140

 * @param [in] transers ownership, callback on completion

141

 * @return -error, 0 on success, -ERANGE when complete

142

*/

143

int read_omap(

144

    read_handle_t id, size_t amount, bufferlist *out, Context *c) = 0;

145

146

/**

147

 * read omap

148

149

 * @param [in] read operation id

150

 * @param [in] max to read

151

 * @param [out] target for read bytes, map<string, bufferlist> encoding

152

 * @param [in] transers ownership, callback on completion

153

 * @return -error, 0 on success

154

*/

155

int read_xattrs(

156

    read_handle_t id, map<string, bufferlist> *out, Context *c) = 0;

157

158

/**

159

 * close read operation

160

161

 * @param [in] read operation id

162

*/

163

void close_read_operation(read_handle_t id) = 0;

164

};

165

</pre>

166

167

The main features of the above interface are:

168

1) The backend neither knows nor cares what the object is.  The interface user gets an uninterpreted json identifier which is the only way to get at the object in the backend.  The rationale for arbitrary json is that the backend might have placement information it needs to include.

169

2) Write is append only, there is no way to re-write to an existing id.

170

3) Read is stream based, no seeking.  Also, reads are stateful -- open, read, close.  The rationale is that there might be backends which would benefit from knowing when we are and are not done with an object.

171

172

Questions:

173

1) How do we (or do we want to?) deal with backends like Amazon Glacier (or some other tape bot) which might take minutes/hours to fetch an object?

174

2) How does this interact with snapshots?  We could make sure the redirect object contains all relevant snapshot information and still be able to perform trims -- or we could disallow snapshots.

175

3) What do we do with orphaned puts?  Probably we need to write ahead into the pg log an intent to write object h to a particular backend id and cleanup on osd restart.

176

177

h3. Work items

178

179

h4. Coding tasks

180

181

# Task 1

182

# Task 2

183

# Task 3

184

185

h4. Build / release tasks

186

187

# Task 1

188

# Task 2

189

# Task 3

190

191

h4. Documentation tasks

192

193

# Task 1

194

# Task 2

195

# Task 3

196

197

h4. Deprecation tasks

198

199

# Task 1

200

# Task 2

201

# Task 3

Project

General

Profile

Ceph

Osd - Tiering II (Warm->Cold) » History » Version 1