Project

General

Profile

Osd - Tiering II (Warm->Cold) » History » Version 1

Jessica Mack, 08/26/2015 01:06 AM

1 1 Jessica Mack
h1. Osd - Tiering II (Warm->Cold)
2
3
h3. Summary
4
5
Cache/Tiering can be useful in some cases where the placing the slower storage on another rados tier in the same cluster is beneficial.  However, there are use cases where even that would be too expensive.  For those cases, it might be useful for ceph to to be able to demote sufficiently frosty objects to some other service using a plugin abstraction.
6
7
h3. Owners
8
9
* Sam Just (RedHat)
10
* Name (Affiliation)
11
* Name
12
13
h3. Interested Parties
14
15
* Loic Dachary (Red Hat)
16
* Name (Affiliation)
17
* Name
18
19
h3. Current Status
20
 
21
22
h3. Detailed Description
23
24
The idea is to extend the current cache/tiering infrastructure in a few ways:
25
1) Add a new flavor of hot tier which instead of using lack of an object to indicate that it needs to check the next lower tier uses explicit link objects to redirect to the lower tier.  Thus, the hotter tier acts as a metadata cache of sorts.
26
2) Create an abstraction for the lower tier machinery into which can be plugged an implementation.
27
 
28
A first zero thought crack at an interface (any actual version would be in C):
29
30
<pre>
31
 
32
class TierImplementation {
33
class ObjectStream {
34
public:
35
/**
36
 * get more bytes from the object
37
 * 
38
 * @param max [in] max bytes to get
39
 * @param out [out] next bytes of the object
40
 * @return 0 on success, -error on error, -ERANGE when complete
41
 */
42
int get_bytes(size_t max, bufferlist *out) = 0;
43
 
44
/**
45
 * get more bytes from the omap
46
 * 
47
 * @param max [in] max bytes to get
48
 * @param out [out] map<string, bufferlist> encoding of next keys
49
 * @return 0 on success, -error on error, -ERANGE when complete
50
 */
51
int get_omap(size_t max, bufferlist *out) = 0;
52
 
53
/**
54
 * get xattrs
55
 *
56
 * @param attrs [out] attrs
57
 * @return 0 on success, -error on error
58
 */
59
int get_xattrs(map<string, bufferlist> *out) = 0;
60
};
61
typedef uint64_t write_handle_t;
62
typedef uint64_t read_handle_t
63
typedef string backend_object_id_t;
64
static const char *get_implementation_type() = 0;
65
 
66
/**
67
 * Begin object creation operation
68
 *
69
 * @param [out] string backend id -- json -- globally unique
70
 * @return demotion operation handle
71
 */
72
write_handle_t open_object(backend_object_id_t *out) = 0;
73
 
74
/**
75
 * Append bytes to open object
76
 *
77
 * @param [in] write operation handle
78
 * @param [in] bufferlist bytes to append
79
 * @param [in] transfers ownership, callback on completion
80
 * @return -error, 0 on success
81
 */
82
int append_bytes(write_handle_t id, bufferlist bytes, Context *c) = 0;
83
 
84
/**
85
 * Append omap keys
86
 *
87
 * @param [in] write operation handle 
88
 * @param [in] bufferlist map<string, bufferlist> encoding keys to add
89
 * @param [in] transfers ownership, callback on completion
90
 * @return -error, 0 on success
91
 */
92
int append_omap(write_handle_t id, bufferlist bytes, Context *c) = 0;
93
 
94
/**
95
 * Add xattrs and close object
96
 *
97
 * @param [in] write handle to close
98
 * @param [in] xattrs for object
99
 * @param [in] transers ownership, callback on completion
100
 * @return -error, 0 on success
101
 */
102
int close_object(write_handle_t id, const map<string, bufferlist> &attrs,
103
                 Context *c) = 0;
104
/**
105
 * remove specified object
106
 *
107
 * @param id [in] object to remove
108
 * @param [in] transers ownership, callback on completion
109
 * @return -error, 0 on success
110
 */
111
int remove_object(const backend_object_id_t &id, Context *c) = 0;
112
 
113
/**
114
 * open object for reading
115
 *
116
 * @param [in] backend id
117
 * @return read operation id
118
 */
119
read_handle_t open_read_operation(backend_object_id_t id) = 0;
120
 
121
/**
122
 * read bytes
123
 *
124
 * @param [in] read operation id
125
 * @param [in] max to read
126
 * @param [out] target for read bytes
127
 * @param [in] transers ownership, callback on completion
128
 * @return -error, 0 on success, -ERANGE when complete
129
 */
130
int read_bytes(
131
    read_handle_t id, size_t amount, bufferlist *out, Context *c) = 0;
132
 
133
 
134
/**
135
 * read omap
136
 *
137
 * @param [in] read operation id
138
 * @param [in] max to read
139
 * @param [out] target for read bytes, map<string, bufferlist> encoding
140
 * @param [in] transers ownership, callback on completion
141
 * @return -error, 0 on success, -ERANGE when complete
142
 */
143
int read_omap(
144
    read_handle_t id, size_t amount, bufferlist *out, Context *c) = 0;
145
 
146
/**
147
 * read omap
148
 *
149
 * @param [in] read operation id
150
 * @param [in] max to read
151
 * @param [out] target for read bytes, map<string, bufferlist> encoding
152
 * @param [in] transers ownership, callback on completion
153
 * @return -error, 0 on success
154
 */
155
int read_xattrs(
156
    read_handle_t id, map<string, bufferlist> *out, Context *c) = 0;
157
 
158
/**
159
 * close read operation
160
 *
161
 * @param [in] read operation id
162
 */
163
void close_read_operation(read_handle_t id) = 0;
164
};
165
</pre>
166
 
167
The main features of the above interface are:
168
1) The backend neither knows nor cares what the object is.  The interface user gets an uninterpreted json identifier which is the only way to get at the object in the backend.  The rationale for arbitrary json is that the backend might have placement information it needs to include.
169
2) Write is append only, there is no way to re-write to an existing id.
170
3) Read is stream based, no seeking.  Also, reads are stateful -- open, read, close.  The rationale is that there might be backends which would benefit from knowing when we are and are not done with an object.
171
 
172
Questions:
173
1) How do we (or do we want to?) deal with backends like Amazon Glacier (or some other tape bot) which might take minutes/hours to fetch an object?
174
2) How does this interact with snapshots?  We could make sure the redirect object contains all relevant snapshot information and still be able to perform trims -- or we could disallow snapshots.
175
3) What do we do with orphaned puts?  Probably we need to write ahead into the pg log an intent to write object h to a particular backend id and cleanup on osd restart.
176
177
h3. Work items
178
179
h4. Coding tasks
180
181
# Task 1
182
# Task 2
183
# Task 3
184
185
h4. Build / release tasks
186
187
# Task 1
188
# Task 2
189
# Task 3
190
191
h4. Documentation tasks
192
193
# Task 1
194
# Task 2
195
# Task 3
196
197
h4. Deprecation tasks
198
199
# Task 1
200
# Task 2
201
# Task 3