Project

General

Profile

Rgw new multisite configuration » History » Version 6

Orit Wasserman, 07/01/2015 09:17 AM

1 1 Yehuda Sadeh
+*RGW NEW MULTISITE CONFIG*+
2
3
*Summary*
4
As part of the new multi site scheme, we change the way the system is configured.
5
6
*Owners*
7
Orit Wasserman (Red Hat)
8
Yehuda Sadeh (Red Hat)
9
Name
10
11
*Interested Parties*
12
If you are interested in contributing to this blueprint, or want to be a "speaker" during the Summit session, list your name here.
13
Name (Affiliation)
14
Name (Affiliation)
15
Name
16
17
*Current Status*
18
Worked on the design, initial implementation work.
19
20
*Detailed Description*
21
22
h2. Definitions
23
24
A zone is a collection of pools and radosgw’s in the same cluster that serve the same data.
25
26
A zone group is a collection of zones that replicate to or from each other.  The zones may (or may not) span different clusters.
27
28
A zone realm is a collection of zonegroups that share the same user and bucket namespace.
29
A period is a period of time during which a given zonegroup configuration is in effect.  Each period references the previous period that preceded it and will record basic metadata like the start time.  During each period there may be changes to the zone and zonegroup maps; each of these changes will increment the period epoch.
30
31
h2. 1. Configuration Changes
32
33
h3. 1.1 Zonegroup map
34
35
The zonegroup map holds a map of the entire system, and certain configurables for the different realms, zonegroups and zones. It holds the relationships between the different zonegroups and other configuration:
36
37
For the period
38 5 Orit Wasserman
 - uuid
39 1 Yehuda Sadeh
 - which period preceded it
40
 - the version vector for the previous period’s metadata log
41
 - list of zonegroups
42
 - which zonegroup is the new metadata master
43
 - which zones belong to each zonegroup
44
 - which zone is master for each zonegroup
45
46
For the realm
47 5 Orit Wasserman
 - id, name
48 1 Yehuda Sadeh
 - which zonegroup is the master for user/bucket metadata
49
 - list of zonegroups in the realm
50
 - current period
51
52
For each zonegroup
53 5 Orit Wasserman
 - id, name
54 1 Yehuda Sadeh
 - access url[s] (for control/replication API)
55
 - existing storage policies
56
 - which zone is master for metadata
57
 - which zone(s) are master or slave for data
58
 - list of zones
59
60
For each zone
61
 - id, name
62
 - access url[s]
63
 - peers
64
65
There will be one zone that will be designated as the master in the master zonegroup, and will manage all user and bucket creation (metadata) and control of the zonegroup map. In order to make a change to the system configuration, a command will be sent to the url of this master and the new configuration will propagate to the rest of the system. 
66
rgw will be able to handle dynamic changes to the zonegroup and zone configuration.
67
68
zonegroup map will have a version epoch that will increment after every change.
69
70
<pre>
71
.rgw.root
72
default_realm -> $realm
73
realm.$realm  -> current period
74
period.$realm.$uuid.$epoch -> period object
75
period.$realm.$uuid -> latest $epoch
76
zone.$zone -> $realm
77
</pre>
78
79
h4. multi site todo
80
81
period and realm data structures
82
APIs for pushing and pulling zone metadata
83
rgw needs to do watch/notify or poll on the realm.$realm object, restart as needed
84
gracefully drain requests on old backend instance; startup new one on epoch or period change
85
86
h4. metadata sync todo
87
88
user instance
89
save version for every metadata object
90
log versions for every metadata object
91
log metadata about which period we are on, which objects are dirty/stale, rollback/rollforward state
92
93
h3. 1.2 Defining a new zonegroup
94
95
Currently, in order to define a new zonegroup, we need to inject a json that holds the zonegroup configuration, then we need to update the zonegroupmap, and then we need to distribute that zonegroupmap into all existing zonegroups and restart all rgws for that to take effect. I don't think this is a good scheme.
96
97
A zonegroup will have a zonegroup id, and a zonegroup name. For backward compatibility, older zonegroups will have their zonegroup_id equal to their name.
98
99
When setting up a new zonegroup, we'll need to specify an entry point for the 'master' zonegroup. That zonegroup will be in control of the zonegroupmap, and it will distribute the zonegroupmap updates to all zones.
100
101
If the zonegroup that we set up is the first zonegroup, we'll need to specify it in the command line. We won't be able to set up a secondary zonegroups if the master has not been specified.
102
103
h3. 1.3. Defining a new zone
104
105
Currently, when running an rgw it does the following:
106
107
Read the rgw_zone configurable, check the root pool for the configuration of this zone. If rgw_zone is not defined it will read the default zone name out of the
108
it will create the 'default' zone, and assign it as the default.
109
110
Once a zone name has been set, it cannot really be changed. The zone names are embedded in the rados object names that are created to hold the actual rgw objects.
111
112
In order to support zone renaming, and more dynamic configuration we should create a logical 'zone id' that the zone name will point at. The zone id will be a string. When creating a new zone it will be auto generated, and will not be modified. For backward compatibility, older zones will have a zone_id that will match their zone name.
113
114
To set up a new zone, the rgw command will include the url to the master zonegroup, and keys to access it. It will also include the name of the zonegroup this zone should reside in. If this zonegroup does not exist, it will be created (if appropriate param was passed in). The master zonegroup will create a new system user for this specific zone, and will send it back.
115
116
When a new zone starts up, we'll auto-create all the rados pools that it will use. It will first need to determine whether pools already exist, and are already assigned to a different zone. The naming scheme for the pools would be something like:
117
118
<pre>
119
.{zone_id}-x-{pool-name}
120
rgw.$zoneid.$pool
121
.rgw - bucket -> bucketid metadata
122
.users - user index
123
.users.swift
124
.users.uid
125
.control - contains notify object
126
.log - metadata log, which-buckets-have-changed log
127
.gc - garbage collection
128
.usage - sharded usage stats
129
.bucket-index
130
.bucket-data - bucket data
131
.bucket-data-nonec - non-ec bucket data
132
</pre>
133
134
We want to allow the same gateway to be part of multiple zones, this will give us much more flexibility. Different zones will have different ports.
135
136
h2. 1.4. Dynamic zonegroup and zone changes
137
138
rgw will be able to identify changes to the zonegroupmap, and to the zone configuration. This will be done by the following:
139
140
rgw will be able to restart itself with a new rados backend handler (RGWRados) after detecting that a configuration change has been made. It will finish handling existing requests, but restart all the frontend handlers with the new RGWRados config.
141
rgw will set a specific watch/notify handler that will be used to getting updates about the zonegroupmap configuration.
142
Upon receiving a change, the master zonegroup zone will send a message to all the different zonegroups about the new configuration change.
143
144
Any synchronization activity will be dynamically re-set according to the new configuration.
145
146
h2. 1.5. New RESTful apis
147
148
h3. Get period information
149
150
<pre>
151
GET /admin/realm/period?[period-id=<period-id>][&epoch=<epoch>]
152
</pre>
153
154
period-id: optional
155
epoch: optional
156
157
Output:
158
159
A JSON representation of the current period, or the specified period
160
161
h3. 1.5.2 Request children to fetch period:
162
163
<pre>
164
POST /admin/realm/period?[period-id=<period-id>][&epoch=<epoch>]
165
</pre>
166
167
Input:
168
169
period-id: optional
170
epoch: optional
171
172
A JSON representation of the current period, or the specified period
173
174
h3. 1.5.3. Initialize new zone
175
176
Will be sent by the config utility (probably radosgw-admin) to the master zonegroup.
177
178
<pre>
179
POST /admin/zonegroup?init-zone
180
</pre>
181
182
Input:
183
184
a JSON representation of the following:
185
186
* zonegroup name
187
* zone name
188
* list of peers (zone ids)
189
190
Output:
191
192
a JSON representation of the following:
193
* metadata of user to be used by zone
194
* new zonegroup map
195
196
1.5.4. Notify of zonegroup map change
197
198
<pre>
199
POST /admin/zonegroup?reconfigure
200
</pre>
201
202
Input:
203
204
 - new zonegroup map
205
h2. 1.6. New radosgw-admin, radosgw interfaces:
206
207
h3. 1.6.1 period
208
209
<pre>
210 4 Orit Wasserman
$ radosgw-admin period prepare --parent=<parent> [--realm-id=<realm> --realm=<realm name>]
211 1 Yehuda Sadeh
</pre>
212
Creating a new period object in .rgw.root pool.
213
214
<pre>
215
$ radosgw-admin period activate <uuid>
216
</pre> 
217
Switch to a new period.
218
must be a child of the current period
219
The admin need to reconfigure all the gateways, at first the gateway will need to be restarted to use the new period. In the future they support dynamic configuration.
220
221
<pre>
222
$ radosgw-admin period pull
223
</pre>
224
pull latest period map from current period master
225
requires that radosgw-admin uses RESTful api
226
227
<pre>
228
$ radosgw-admin period pull <remote> <uuid> [--url=<url>]
229
</pre>
230
url: optionally provide remote entry point
231
Fetch info about a specific remote period
232
233
<pre>
234
$ radosgw-admin period push  
235
</pre> 
236
Ask all children to pull latest epoch
237
238
We need to create a mechanism to allow the admin to communicate with other gateways.
239
240
1.6.2 zone realm
241
242
<pre>
243
$ radosgw-admin realm create  --realm=<name>
244
</pre>
245
Create a new zone realm, implicitly creates the first period
246
247
<pre>
248
$ radosgw-admin realm remove  --realm=<name>  [--realm-id=<id>]  --zonegroup=<name>
249
</pre>
250
Remove a zonegroup from a realm
251
252
<pre>
253
$ radosgw-admin realm delete --realm=<name>
254
</pre> 
255
Delete a  realm, needs to be empty
256
257
<pre>
258
$ radosgw-admin realm rename --realm=<old name> --new-realm-name =<new name> [--realm_id=<id>]
259
</pre> 
260
rename a realm.
261
262
<pre>
263
$ radosgw-admin realm set-default --realm=dho
264
</pre>
265
set realm as the default realm
266
267
<pre>
268
$ radosgw-admin realm get  --realm=<name> | --realm-id=<id>
269
</pre>
270
Get realm information
271
272
h3. 1.6.3 zonegroup
273
274
<pre>
275
$ radosgw-admin zonegroup create --zonegroup=<name> [ --zonegroup-id=<id>]  [--master | --master-url=<url> |  --realm=<name>]
276
</pre>
277
278
When doing a remote command that contacts the master zonegroup, we'll also need to provide a uid, and access key. This can be done by specifying --uid and --access-key on the command line (which is a bit of a security problem), or by setting it in ceph.conf (which is a bit of a pain).
279
280
<pre>
281
$ radosgw-admin zonegroup delete --zonegroup=<name>  [ --zonegroup-id=<id>] [--master-url=<url>]
282
</pre>
283
Remove  a zonegroup, the zonegroup needs to be empty.
284
285
<pre>
286
$ radosgw-admin zonegroup rename --zonegroup=<old name>  [ --zonegroup-id=<id>] [--master-url=<url>] --zonegroup-new-name=<new name>
287
</pre>
288
Rename  a zonegroup.
289
290
h3. 1.6.4 creating  a new zone
291
292
<pre>
293
$ radosgw-admin zone create --rgw-zone=<zone_name> --zonegroup=<zonegroup_name> --url=<zone url> [--master | --master-url=<url>]
294
</pre>
295
296
This command will either set the initial master zone for the system, or will create a new zone.  It will generate a new random zoneid (uuid).
297
298
radosgw will no longer create pools automagically when it starts up.  Zone creation will always be an explicit step by the admin.
299
300
h3. 1.6.5 Modifying zone configuration:
301
302
- Connect zone to another peer (meaning these two zones will sync to/from each other)
303
304
<pre>
305
$ radosgw-admin zone connect [--rgw-zone=<zone name>] [--zone-id=<zone id>] --peer-zone-id=<peer id> | --peer-zone=<peer name>
306
</pre>
307
308
- Disconnect zone from another peer
309
310
<pre>
311
$ radosgw-admin zone disconnect [--rgw-zone=<zone name>] [--zone-id=<zone id>] --peer-zone-id=<peer id> --peer-zone=<peer name>
312
</pre>
313
314
- Configure a zone placement target (storage policy)
315
316
<pre>
317
$ radosgw-admin placement modify --placement-target=<name> --zone-id=<id> ... (TBD what exactly)
318
</pre>
319
320
- Check zone sync status:
321
322
<pre>
323
$ radosgw-admin zone sync status [--rgw-zone=<zone name>]
324
</pre>
325
326
Will provide current markers and timestamps for specified zone.
327
328
h3. 1.6.6 removing a zone from a zonegroup
329
330
<pre>
331
$ radosgw-admin zone remove --rgw-zone=<zone_name> [--zone-id=<zone id>] --zonegroup=<zonegroup_name>
332
</pre>
333
334
h3. 1.6.6 delete a zone 
335
336
<pre>
337
$ radosgw-admin zone delete--rgw-zone=<zone_name> [--zone-id=<zone id>]
338
</pre> 
339
 Remove the zone from the system, the zone will be removed from all the zonegroups
340
341
h3. 1.6.6 rename a zone 
342
343
<pre>
344
$ radosgw-admin zone rename--rgw-zone=<zone_name> [--zone-id=<zone id>] --zone-new-name=<new name>
345
</pre>
346
347
*Work items*
348
349
*Coding tasks*
350
Task 1
351
Task 2
352
Task 3
353
354
*Build / release tasks*
355
Task 1
356
Task 2
357
Task 3
358
359
*Documentation tasks*
360
Task 1
361
Task 2
362
Task 3
363
364
*Deprecation tasks*
365
Task 1
366
Task 2
367
Task 3