Project

General

Profile

Rgw new multisite configuration » History » Version 11

Yehuda Sadeh, 07/01/2015 02:28 PM

1 1 Yehuda Sadeh
+*RGW NEW MULTISITE CONFIG*+
2
3
*Summary*
4
As part of the new multi site scheme, we change the way the system is configured.
5
6
*Owners*
7
Orit Wasserman (Red Hat)
8
Yehuda Sadeh (Red Hat)
9
Name
10
11
*Interested Parties*
12
If you are interested in contributing to this blueprint, or want to be a "speaker" during the Summit session, list your name here.
13
Name (Affiliation)
14
Name (Affiliation)
15
Name
16
17
*Current Status*
18
Worked on the design, initial implementation work.
19
20
*Detailed Description*
21
22
h2. Definitions
23
24
A zone is a collection of pools and radosgw’s in the same cluster that serve the same data.
25
26
A zone group is a collection of zones that replicate to or from each other.  The zones may (or may not) span different clusters.
27
28
A zone realm is a collection of zonegroups that share the same user and bucket namespace.
29
A period is a period of time during which a given zonegroup configuration is in effect.  Each period references the previous period that preceded it and will record basic metadata like the start time.  During each period there may be changes to the zone and zonegroup maps; each of these changes will increment the period epoch.
30
31
h2. 1. Configuration Changes
32
33
h3. 1.1 Zonegroup map
34
35
The zonegroup map holds a map of the entire system, and certain configurables for the different realms, zonegroups and zones. It holds the relationships between the different zonegroups and other configuration:
36
37
For the period
38 5 Orit Wasserman
 - uuid
39 7 Orit Wasserman
 - epoch
40 1 Yehuda Sadeh
 - which period preceded it
41
 - the version vector for the previous period’s metadata log
42
 - list of zonegroups
43
 - which zonegroup is the new metadata master
44
 - which zones belong to each zonegroup
45
 - which zone is master for each zonegroup
46
47
For the realm
48 5 Orit Wasserman
 - id, name
49 1 Yehuda Sadeh
 - which zonegroup is the master for user/bucket metadata
50
 - list of zonegroups in the realm
51
 - current period
52
53
For each zonegroup
54 5 Orit Wasserman
 - id, name
55 1 Yehuda Sadeh
 - access url[s] (for control/replication API)
56
 - existing storage policies
57
 - which zone is master for metadata
58
 - which zone(s) are master or slave for data
59
 - list of zones
60
61
For each zone
62
 - id, name
63
 - access url[s]
64
 - peers
65
66
There will be one zone that will be designated as the master in the master zonegroup, and will manage all user and bucket creation (metadata) and control of the zonegroup map. In order to make a change to the system configuration, a command will be sent to the url of this master and the new configuration will propagate to the rest of the system. 
67
rgw will be able to handle dynamic changes to the zonegroup and zone configuration.
68
69
zonegroup map will have a version epoch that will increment after every change.
70
71
<pre>
72
.rgw.root
73
default_realm -> $realm
74
realm.$realm  -> current period
75
period.$realm.$uuid.$epoch -> period object
76
period.$realm.$uuid -> latest $epoch
77
zone.$zone -> $realm
78
</pre>
79
80
h4. multi site todo
81
82
period and realm data structures
83
APIs for pushing and pulling zone metadata
84
rgw needs to do watch/notify or poll on the realm.$realm object, restart as needed
85
gracefully drain requests on old backend instance; startup new one on epoch or period change
86
87
h4. metadata sync todo
88
89
user instance
90
save version for every metadata object
91
log versions for every metadata object
92
log metadata about which period we are on, which objects are dirty/stale, rollback/rollforward state
93
94
h3. 1.2 Defining a new zonegroup
95
96
Currently, in order to define a new zonegroup, we need to inject a json that holds the zonegroup configuration, then we need to update the zonegroupmap, and then we need to distribute that zonegroupmap into all existing zonegroups and restart all rgws for that to take effect. I don't think this is a good scheme.
97
98
A zonegroup will have a zonegroup id, and a zonegroup name. For backward compatibility, older zonegroups will have their zonegroup_id equal to their name.
99
100
When setting up a new zonegroup, we'll need to specify an entry point for the 'master' zonegroup. That zonegroup will be in control of the zonegroupmap, and it will distribute the zonegroupmap updates to all zones.
101
102
If the zonegroup that we set up is the first zonegroup, we'll need to specify it in the command line. We won't be able to set up a secondary zonegroups if the master has not been specified.
103
104
h3. 1.3. Defining a new zone
105
106
Currently, when running an rgw it does the following:
107
108
Read the rgw_zone configurable, check the root pool for the configuration of this zone. If rgw_zone is not defined it will read the default zone name out of the
109
it will create the 'default' zone, and assign it as the default.
110
111
Once a zone name has been set, it cannot really be changed. The zone names are embedded in the rados object names that are created to hold the actual rgw objects.
112
113
In order to support zone renaming, and more dynamic configuration we should create a logical 'zone id' that the zone name will point at. The zone id will be a string. When creating a new zone it will be auto generated, and will not be modified. For backward compatibility, older zones will have a zone_id that will match their zone name.
114
115
To set up a new zone, the rgw command will include the url to the master zonegroup, and keys to access it. It will also include the name of the zonegroup this zone should reside in. If this zonegroup does not exist, it will be created (if appropriate param was passed in). The master zonegroup will create a new system user for this specific zone, and will send it back.
116
117
When a new zone starts up, we'll auto-create all the rados pools that it will use. It will first need to determine whether pools already exist, and are already assigned to a different zone. The naming scheme for the pools would be something like:
118
119
<pre>
120
.{zone_id}-x-{pool-name}
121
rgw.$zoneid.$pool
122
.rgw - bucket -> bucketid metadata
123
.users - user index
124
.users.swift
125
.users.uid
126
.control - contains notify object
127
.log - metadata log, which-buckets-have-changed log
128
.gc - garbage collection
129
.usage - sharded usage stats
130
.bucket-index
131
.bucket-data - bucket data
132
.bucket-data-nonec - non-ec bucket data
133
</pre>
134
135
We want to allow the same gateway to be part of multiple zones, this will give us much more flexibility. Different zones will have different ports.
136
137
h2. 1.4. Dynamic zonegroup and zone changes
138
139
rgw will be able to identify changes to the zonegroupmap, and to the zone configuration. This will be done by the following:
140
141
rgw will be able to restart itself with a new rados backend handler (RGWRados) after detecting that a configuration change has been made. It will finish handling existing requests, but restart all the frontend handlers with the new RGWRados config.
142
rgw will set a specific watch/notify handler that will be used to getting updates about the zonegroupmap configuration.
143
Upon receiving a change, the master zonegroup zone will send a message to all the different zonegroups about the new configuration change.
144
145
Any synchronization activity will be dynamically re-set according to the new configuration.
146
147
h2. 1.5. New RESTful apis
148
149 9 Orit Wasserman
h3. 1.5.1 Get period information
150 1 Yehuda Sadeh
151
<pre>
152
GET /admin/realm/period?[period-id=<period-id>][&epoch=<epoch>]
153
</pre>
154
155
period-id: optional
156
epoch: optional
157
158
Output:
159
160
A JSON representation of the current period, or the specified period
161
162
h3. 1.5.2 Request children to fetch period:
163
164
<pre>
165
POST /admin/realm/period?[period-id=<period-id>][&epoch=<epoch>]
166
</pre>
167
168
Input:
169
170
period-id: optional
171
epoch: optional
172
173
A JSON representation of the current period, or the specified period
174
175
h3. 1.5.3. Initialize new zone
176
177
Will be sent by the config utility (probably radosgw-admin) to the master zonegroup.
178
179
<pre>
180
POST /admin/zonegroup?init-zone
181
</pre>
182
183
Input:
184
185
a JSON representation of the following:
186
187
* zonegroup name
188
* zone name
189
* list of peers (zone ids)
190
191
Output:
192
193
a JSON representation of the following:
194
* metadata of user to be used by zone
195
* new zonegroup map
196
197 10 Orit Wasserman
h3. 1.5.4. Notify of zonegroup map change
198 1 Yehuda Sadeh
199
<pre>
200
POST /admin/zonegroup?reconfigure
201
</pre>
202
203
Input:
204
205
 - new zonegroup map
206 11 Yehuda Sadeh
207
208 1 Yehuda Sadeh
h2. 1.6. New radosgw-admin, radosgw interfaces:
209
210
h3. 1.6.1 period
211
212
<pre>
213 4 Orit Wasserman
$ radosgw-admin period prepare --parent=<parent> [--realm-id=<realm> --realm=<realm name>]
214 1 Yehuda Sadeh
</pre>
215
Creating a new period object in .rgw.root pool.
216
217
<pre>
218
$ radosgw-admin period activate <uuid>
219
</pre> 
220
Switch to a new period.
221
must be a child of the current period
222
The admin need to reconfigure all the gateways, at first the gateway will need to be restarted to use the new period. In the future they support dynamic configuration.
223
224
<pre>
225
$ radosgw-admin period pull
226
</pre>
227
pull latest period map from current period master
228
requires that radosgw-admin uses RESTful api
229
230
<pre>
231
$ radosgw-admin period pull <remote> <uuid> [--url=<url>]
232
</pre>
233
url: optionally provide remote entry point
234
Fetch info about a specific remote period
235
236
<pre>
237
$ radosgw-admin period push  
238
</pre> 
239
Ask all children to pull latest epoch
240
241
We need to create a mechanism to allow the admin to communicate with other gateways.
242
243 8 Orit Wasserman
h3. 1.6.2 zone realm
244 1 Yehuda Sadeh
245
<pre>
246
$ radosgw-admin realm create  --realm=<name>
247
</pre>
248
Create a new zone realm, implicitly creates the first period
249
250
<pre>
251
$ radosgw-admin realm remove  --realm=<name>  [--realm-id=<id>]  --zonegroup=<name>
252
</pre>
253
Remove a zonegroup from a realm
254
255
<pre>
256
$ radosgw-admin realm delete --realm=<name>
257
</pre> 
258
Delete a  realm, needs to be empty
259
260
<pre>
261
$ radosgw-admin realm rename --realm=<old name> --new-realm-name =<new name> [--realm_id=<id>]
262
</pre> 
263
rename a realm.
264
265
<pre>
266
$ radosgw-admin realm set-default --realm=dho
267
</pre>
268
set realm as the default realm
269
270
<pre>
271
$ radosgw-admin realm get  --realm=<name> | --realm-id=<id>
272
</pre>
273
Get realm information
274
275
h3. 1.6.3 zonegroup
276
277
<pre>
278
$ radosgw-admin zonegroup create --zonegroup=<name> [ --zonegroup-id=<id>]  [--master | --master-url=<url> |  --realm=<name>]
279
</pre>
280
281
When doing a remote command that contacts the master zonegroup, we'll also need to provide a uid, and access key. This can be done by specifying --uid and --access-key on the command line (which is a bit of a security problem), or by setting it in ceph.conf (which is a bit of a pain).
282
283
<pre>
284
$ radosgw-admin zonegroup delete --zonegroup=<name>  [ --zonegroup-id=<id>] [--master-url=<url>]
285
</pre>
286
Remove  a zonegroup, the zonegroup needs to be empty.
287
288
<pre>
289
$ radosgw-admin zonegroup rename --zonegroup=<old name>  [ --zonegroup-id=<id>] [--master-url=<url>] --zonegroup-new-name=<new name>
290
</pre>
291
Rename  a zonegroup.
292
293
h3. 1.6.4 creating  a new zone
294
295
<pre>
296
$ radosgw-admin zone create --rgw-zone=<zone_name> --zonegroup=<zonegroup_name> --url=<zone url> [--master | --master-url=<url>]
297
</pre>
298
299
This command will either set the initial master zone for the system, or will create a new zone.  It will generate a new random zoneid (uuid).
300
301
radosgw will no longer create pools automagically when it starts up.  Zone creation will always be an explicit step by the admin.
302
303
h3. 1.6.5 Modifying zone configuration:
304
305
- Connect zone to another peer (meaning these two zones will sync to/from each other)
306
307
<pre>
308
$ radosgw-admin zone connect [--rgw-zone=<zone name>] [--zone-id=<zone id>] --peer-zone-id=<peer id> | --peer-zone=<peer name>
309
</pre>
310
311
- Disconnect zone from another peer
312
313
<pre>
314
$ radosgw-admin zone disconnect [--rgw-zone=<zone name>] [--zone-id=<zone id>] --peer-zone-id=<peer id> --peer-zone=<peer name>
315
</pre>
316
317
- Configure a zone placement target (storage policy)
318
319
<pre>
320
$ radosgw-admin placement modify --placement-target=<name> --zone-id=<id> ... (TBD what exactly)
321
</pre>
322
323
- Check zone sync status:
324
325
<pre>
326
$ radosgw-admin zone sync status [--rgw-zone=<zone name>]
327
</pre>
328
329
Will provide current markers and timestamps for specified zone.
330
331
h3. 1.6.6 removing a zone from a zonegroup
332
333
<pre>
334
$ radosgw-admin zone remove --rgw-zone=<zone_name> [--zone-id=<zone id>] --zonegroup=<zonegroup_name>
335
</pre>
336
337
h3. 1.6.6 delete a zone 
338
339
<pre>
340
$ radosgw-admin zone delete--rgw-zone=<zone_name> [--zone-id=<zone id>]
341
</pre> 
342
 Remove the zone from the system, the zone will be removed from all the zonegroups
343
344
h3. 1.6.6 rename a zone 
345
346
<pre>
347
$ radosgw-admin zone rename--rgw-zone=<zone_name> [--zone-id=<zone id>] --zone-new-name=<new name>
348
</pre>
349
350
*Work items*
351
352
*Coding tasks*
353
Task 1
354
Task 2
355
Task 3
356
357
*Build / release tasks*
358
Task 1
359
Task 2
360
Task 3
361
362
*Documentation tasks*
363
Task 1
364
Task 2
365
Task 3
366
367
*Deprecation tasks*
368
Task 1
369
Task 2
370
Task 3