Project

General

Profile

Rgw new multisite configuration » History » Version 1

Yehuda Sadeh, 06/16/2015 08:24 PM

1 1 Yehuda Sadeh
+*RGW NEW MULTISITE CONFIG*+
2
3
*Summary*
4
As part of the new multi site scheme, we change the way the system is configured.
5
6
*Owners*
7
Orit Wasserman (Red Hat)
8
Yehuda Sadeh (Red Hat)
9
Name
10
11
*Interested Parties*
12
If you are interested in contributing to this blueprint, or want to be a "speaker" during the Summit session, list your name here.
13
Name (Affiliation)
14
Name (Affiliation)
15
Name
16
17
*Current Status*
18
Worked on the design, initial implementation work.
19
20
*Detailed Description*
21
22
h2. Definitions
23
24
A zone is a collection of pools and radosgw’s in the same cluster that serve the same data.
25
26
A zone group is a collection of zones that replicate to or from each other.  The zones may (or may not) span different clusters.
27
28
A zone realm is a collection of zonegroups that share the same user and bucket namespace.
29
A period is a period of time during which a given zonegroup configuration is in effect.  Each period references the previous period that preceded it and will record basic metadata like the start time.  During each period there may be changes to the zone and zonegroup maps; each of these changes will increment the period epoch.
30
31
h2. 1. Configuration Changes
32
33
h3. 1.1 Zonegroup map
34
35
The zonegroup map holds a map of the entire system, and certain configurables for the different realms, zonegroups and zones. It holds the relationships between the different zonegroups and other configuration:
36
37
For the period
38
 - which period preceded it
39
 - the version vector for the previous period’s metadata log
40
 - list of zonegroups
41
 - which zonegroup is the new metadata master
42
 - which zones belong to each zonegroup
43
 - which zone is master for each zonegroup
44
45
For the realm
46
 - which zonegroup is the master for user/bucket metadata
47
 - list of zonegroups in the realm
48
 - current period
49
50
For each zonegroup
51
 - access url[s] (for control/replication API)
52
 - existing storage policies
53
 - which zone is master for metadata
54
 - which zone(s) are master or slave for data
55
 - list of zones
56
57
For each zone
58
 - id, name
59
 - access url[s]
60
 - peers
61
62
There will be one zone that will be designated as the master in the master zonegroup, and will manage all user and bucket creation (metadata) and control of the zonegroup map. In order to make a change to the system configuration, a command will be sent to the url of this master and the new configuration will propagate to the rest of the system. 
63
rgw will be able to handle dynamic changes to the zonegroup and zone configuration.
64
65
zonegroup map will have a version epoch that will increment after every change.
66
67
<pre>
68
.rgw.root
69
default_realm -> $realm
70
realm.$realm  -> current period
71
period.$realm.$uuid.$epoch -> period object
72
period.$realm.$uuid -> latest $epoch
73
zone.$zone -> $realm
74
</pre>
75
76
h4. multi site todo
77
78
period and realm data structures
79
APIs for pushing and pulling zone metadata
80
rgw needs to do watch/notify or poll on the realm.$realm object, restart as needed
81
gracefully drain requests on old backend instance; startup new one on epoch or period change
82
83
h4. metadata sync todo
84
85
user instance
86
save version for every metadata object
87
log versions for every metadata object
88
log metadata about which period we are on, which objects are dirty/stale, rollback/rollforward state
89
90
h3. 1.2 Defining a new zonegroup
91
92
Currently, in order to define a new zonegroup, we need to inject a json that holds the zonegroup configuration, then we need to update the zonegroupmap, and then we need to distribute that zonegroupmap into all existing zonegroups and restart all rgws for that to take effect. I don't think this is a good scheme.
93
94
A zonegroup will have a zonegroup id, and a zonegroup name. For backward compatibility, older zonegroups will have their zonegroup_id equal to their name.
95
96
When setting up a new zonegroup, we'll need to specify an entry point for the 'master' zonegroup. That zonegroup will be in control of the zonegroupmap, and it will distribute the zonegroupmap updates to all zones.
97
98
If the zonegroup that we set up is the first zonegroup, we'll need to specify it in the command line. We won't be able to set up a secondary zonegroups if the master has not been specified.
99
100
h3. 1.3. Defining a new zone
101
102
Currently, when running an rgw it does the following:
103
104
Read the rgw_zone configurable, check the root pool for the configuration of this zone. If rgw_zone is not defined it will read the default zone name out of the
105
it will create the 'default' zone, and assign it as the default.
106
107
Once a zone name has been set, it cannot really be changed. The zone names are embedded in the rados object names that are created to hold the actual rgw objects.
108
109
In order to support zone renaming, and more dynamic configuration we should create a logical 'zone id' that the zone name will point at. The zone id will be a string. When creating a new zone it will be auto generated, and will not be modified. For backward compatibility, older zones will have a zone_id that will match their zone name.
110
111
To set up a new zone, the rgw command will include the url to the master zonegroup, and keys to access it. It will also include the name of the zonegroup this zone should reside in. If this zonegroup does not exist, it will be created (if appropriate param was passed in). The master zonegroup will create a new system user for this specific zone, and will send it back.
112
113
When a new zone starts up, we'll auto-create all the rados pools that it will use. It will first need to determine whether pools already exist, and are already assigned to a different zone. The naming scheme for the pools would be something like:
114
115
<pre>
116
.{zone_id}-x-{pool-name}
117
rgw.$zoneid.$pool
118
.rgw - bucket -> bucketid metadata
119
.users - user index
120
.users.swift
121
.users.uid
122
.control - contains notify object
123
.log - metadata log, which-buckets-have-changed log
124
.gc - garbage collection
125
.usage - sharded usage stats
126
.bucket-index
127
.bucket-data - bucket data
128
.bucket-data-nonec - non-ec bucket data
129
</pre>
130
131
We want to allow the same gateway to be part of multiple zones, this will give us much more flexibility. Different zones will have different ports.
132
133
h2. 1.4. Dynamic zonegroup and zone changes
134
135
rgw will be able to identify changes to the zonegroupmap, and to the zone configuration. This will be done by the following:
136
137
rgw will be able to restart itself with a new rados backend handler (RGWRados) after detecting that a configuration change has been made. It will finish handling existing requests, but restart all the frontend handlers with the new RGWRados config.
138
rgw will set a specific watch/notify handler that will be used to getting updates about the zonegroupmap configuration.
139
Upon receiving a change, the master zonegroup zone will send a message to all the different zonegroups about the new configuration change.
140
141
Any synchronization activity will be dynamically re-set according to the new configuration.
142
143
h2. 1.5. New RESTful apis
144
145
h3. Get period information
146
147
<pre>
148
GET /admin/realm/period?[period-id=<period-id>][&epoch=<epoch>]
149
</pre>
150
151
period-id: optional
152
epoch: optional
153
154
Output:
155
156
A JSON representation of the current period, or the specified period
157
158
h3. 1.5.2 Request children to fetch period:
159
160
<pre>
161
POST /admin/realm/period?[period-id=<period-id>][&epoch=<epoch>]
162
</pre>
163
164
Input:
165
166
period-id: optional
167
epoch: optional
168
169
A JSON representation of the current period, or the specified period
170
171
h3. 1.5.3. Initialize new zone
172
173
Will be sent by the config utility (probably radosgw-admin) to the master zonegroup.
174
175
<pre>
176
POST /admin/zonegroup?init-zone
177
</pre>
178
179
Input:
180
181
a JSON representation of the following:
182
183
* zonegroup name
184
* zone name
185
* list of peers (zone ids)
186
187
Output:
188
189
a JSON representation of the following:
190
* metadata of user to be used by zone
191
* new zonegroup map
192
193
1.5.4. Notify of zonegroup map change
194
195
<pre>
196
POST /admin/zonegroup?reconfigure
197
</pre>
198
199
Input:
200
201
 - new zonegroup map
202
h2. 1.6. New radosgw-admin, radosgw interfaces:
203
204
h3. 1.6.1 period
205
206
<pre>
207
$ radosgw-admin period prepare --parent=<parent> <uuid>
208
</pre>
209
Creating a new period object in .rgw.root pool.
210
211
<pre>
212
$ radosgw-admin period activate <uuid>
213
</pre> 
214
Switch to a new period.
215
must be a child of the current period
216
The admin need to reconfigure all the gateways, at first the gateway will need to be restarted to use the new period. In the future they support dynamic configuration.
217
218
<pre>
219
$ radosgw-admin period pull
220
</pre>
221
pull latest period map from current period master
222
requires that radosgw-admin uses RESTful api
223
224
<pre>
225
$ radosgw-admin period pull <remote> <uuid> [--url=<url>]
226
</pre>
227
url: optionally provide remote entry point
228
Fetch info about a specific remote period
229
230
<pre>
231
$ radosgw-admin period push  
232
</pre> 
233
Ask all children to pull latest epoch
234
235
We need to create a mechanism to allow the admin to communicate with other gateways.
236
237
1.6.2 zone realm
238
239
<pre>
240
$ radosgw-admin realm create  --realm=<name>
241
</pre>
242
Create a new zone realm, implicitly creates the first period
243
244
<pre>
245
$ radosgw-admin realm remove  --realm=<name>  [--realm-id=<id>]  --zonegroup=<name>
246
</pre>
247
Remove a zonegroup from a realm
248
249
<pre>
250
$ radosgw-admin realm delete --realm=<name>
251
</pre> 
252
Delete a  realm, needs to be empty
253
254
<pre>
255
$ radosgw-admin realm rename --realm=<old name> --new-realm-name =<new name> [--realm_id=<id>]
256
</pre> 
257
rename a realm.
258
259
<pre>
260
$ radosgw-admin realm set-default --realm=dho
261
</pre>
262
set realm as the default realm
263
264
<pre>
265
$ radosgw-admin realm get  --realm=<name> | --realm-id=<id>
266
</pre>
267
Get realm information
268
269
h3. 1.6.3 zonegroup
270
271
<pre>
272
$ radosgw-admin zonegroup create --zonegroup=<name> [ --zonegroup-id=<id>]  [--master | --master-url=<url> |  --realm=<name>]
273
</pre>
274
275
When doing a remote command that contacts the master zonegroup, we'll also need to provide a uid, and access key. This can be done by specifying --uid and --access-key on the command line (which is a bit of a security problem), or by setting it in ceph.conf (which is a bit of a pain).
276
277
<pre>
278
$ radosgw-admin zonegroup delete --zonegroup=<name>  [ --zonegroup-id=<id>] [--master-url=<url>]
279
</pre>
280
Remove  a zonegroup, the zonegroup needs to be empty.
281
282
<pre>
283
$ radosgw-admin zonegroup rename --zonegroup=<old name>  [ --zonegroup-id=<id>] [--master-url=<url>] --zonegroup-new-name=<new name>
284
</pre>
285
Rename  a zonegroup.
286
287
h3. 1.6.4 creating  a new zone
288
289
<pre>
290
$ radosgw-admin zone create --rgw-zone=<zone_name> --zonegroup=<zonegroup_name> --url=<zone url> [--master | --master-url=<url>]
291
</pre>
292
293
This command will either set the initial master zone for the system, or will create a new zone.  It will generate a new random zoneid (uuid).
294
295
radosgw will no longer create pools automagically when it starts up.  Zone creation will always be an explicit step by the admin.
296
297
h3. 1.6.5 Modifying zone configuration:
298
299
- Connect zone to another peer (meaning these two zones will sync to/from each other)
300
301
<pre>
302
$ radosgw-admin zone connect [--rgw-zone=<zone name>] [--zone-id=<zone id>] --peer-zone-id=<peer id> | --peer-zone=<peer name>
303
</pre>
304
305
- Disconnect zone from another peer
306
307
<pre>
308
$ radosgw-admin zone disconnect [--rgw-zone=<zone name>] [--zone-id=<zone id>] --peer-zone-id=<peer id> --peer-zone=<peer name>
309
</pre>
310
311
- Configure a zone placement target (storage policy)
312
313
<pre>
314
$ radosgw-admin placement modify --placement-target=<name> --zone-id=<id> ... (TBD what exactly)
315
</pre>
316
317
- Check zone sync status:
318
319
<pre>
320
$ radosgw-admin zone sync status [--rgw-zone=<zone name>]
321
</pre>
322
323
Will provide current markers and timestamps for specified zone.
324
325
h3. 1.6.6 removing a zone from a zonegroup
326
327
<pre>
328
$ radosgw-admin zone remove --rgw-zone=<zone_name> [--zone-id=<zone id>] --zonegroup=<zonegroup_name>
329
</pre>
330
331
h3. 1.6.6 delete a zone 
332
333
<pre>
334
$ radosgw-admin zone delete--rgw-zone=<zone_name> [--zone-id=<zone id>]
335
</pre> 
336
 Remove the zone from the system, the zone will be removed from all the zonegroups
337
338
h3. 1.6.6 rename a zone 
339
340
<pre>
341
$ radosgw-admin zone rename--rgw-zone=<zone_name> [--zone-id=<zone id>] --zone-new-name=<new name>
342
</pre>
343
344
h2. 1.7 single standalone zone
345
346
<pre>
347
$ radosgw-admin zone create --zone=foo
348
  rgw.foo.{users,buckets,...}
349
$ radosgw --zone=foo
350
</pre>
351
352
We allow  a zone to run without adding to a realm and zonegroup.
353
We can allow adding a zone with data to a realm only if it is the first zone added.
354
 
355
h3. 1.7.1 create a replica
356
357
option 1:
358
359
<pre>
360
B$ radosgw-admin zone create --zone=foo-backup
361
B$ radosgw --zone=foo-backup
362
363
A$ radosgw-admin realm create --realm=dho   # implicitly creates an initial period
364
A$ radosgw-admin zonegroup create --realm=dho --zonegroup=us-west
365
A$ radosgw-admin zonegroup add --zonegroup us-west --zone=foo --cluster-uuid=blah 
366
-> these all change the period metadata .. no effect on radosgw yet!
367
A$ radosgw-admin zone join --zone=foo --realm=dho
368
A$ killall -1 radosgw
369
-> now radosgw knows it belongs to a realm and is watching the period
370
A$ radosgw-admin zonegroup add --zonegroup us-west --zone=foo-backup --cluster-uuid=blah
371
A$ radosgw-admin period show
372
-> period references foo-backup, but foo-backup is still ignorant of all this
373
374
B$ radosgw-admin period pull http://cluster-a [perioduuid]
375
B$ radosgw-admin zone join --zone=foo-backup
376
B$ killall -1 radosgw
377
</pre>
378
379
option 2:
380
381
<pre>
382
A$ radosgw-admin realm create --realm=dho --default
383
A$ radosgw-admin zonegroup create --zonegroup=us-west
384
A$ radosgw-admin zone create --zone=us-west-1 --realm=dho
385
A$ radosgw-admin zonegroup add --zonegroup=us-west --realm=dho --zone=us-west-1
386
A$ radosgw --zone=us-west-1
387
B$ radosgw-admin pull http://…. --realm=dho   # now B knows the realm exists
388
B$ radosgw-admin realm set-default --realm=dho
389
B$ radosgw-admin zone create --zone=us-west-2 --realm=dho
390
B$ radosgw --zone=us-west-2
391
A$ radosgw-admin zonegroup add --zone=us-west-2
392
B$ radosgw-admin pull               # now B knows the new zone is part of the ZG
393
</pre>
394
395
Bootstrap:
396
<pre>
397
398
A$ radosgw-admin realm bootstrap-master --realm=dho --zonegroup=us-west --zone=us-west-1
399
A$ radosgw --zone=us-west-1
400
B$ radosgw-admin realm bootstrap --realm=dho --realm-endpoint=http://
401
B$ radosgw-admin zone bootstrap --realm=dho --zonegroup=us-west --zone=us-west-2 
402
B$ radosgw --zone=us-west-2
403
B$ radosgw-admin zone bootstrap --realm=dho --zonegroup=us-west --zone=us-west-3
404
B$ radosgw --zone=us-west-3
405
</pre>
406
407
408
409
h2. 1.7. A usage example. Setting up two zonegroups, with two zones in each:
410
411
Zonegroup: us-west
412
413
 Zone: us-west-1 (ceph cluster 1)
414
  - url: http://us-west-1.example.com
415
416
 Zone: us-west-2 (ceph cluster 2)
417
  - url: http://us-west-2.example.com
418
419
Zonegroup: us-east
420
421
 Zone: us-east-1 (ceph cluster 2)
422
 - url: http://us-east-1.example.com
423
424
 Zone: us-east-2 (ceph cluster 3)
425
  - url: http://us-east-2.example.com
426
427
 - In ceph cluster 1:
428
429
<pre>
430
$ radosgw-admin zonegroup create --zonegroup=us-west --master --url=http://us-west-1.example.com
431
$ radosgw-admin zone create --rgw-zone=us-west-1 --zonegroup=us-west --url=http://us-west-1.example.com
432
$ radosgw --rgw-zone=us-west-1
433
</pre>
434
435
 - In ceph cluster 2:
436
<pre>
437
$ radosgw-admin zone init --rgw-zone=us-west-2 --zonegroup=us-west --url=http://us-west-2.example.com --master-url=http://us-west-1.example.com
438
$ radosgw --rgw-zone=us-west-2
439
$ radosgw-admin zonegroup init --zonegroup=us-east --url=http://us-east-1.example.com --master-url=http://us-west-1.example.com
440
$ radosgw-admin zone init --rgw-zone=us-east-1 --zonegroup=us-east --url=http://us-east-1.example.com --master-url=http://us-west-1.example.com
441
$ radosgw --rgw-zone=us-east-1
442
</pre>
443
444
 - in ceph cluster 3:
445
<pre>
446
$ radosgw-admin zone init --rgw-zone=us-east-1 --zonegroup=us-east --url=http://us-east-2.example.com --master-url=http://us-west-1.example.com
447
$ radosgw --rgw-zone=us-east-2
448
</pre>
449
450
451
452
Note that these commands don't include the access keys to access the master zone. This will also need to be set, either through the command line, or via ceph.conf.
453
454
h2. 1.8. Optional simplification:
455
Instead of creating a zone and running radosgw, we can do it in one step via radosgw itself, e.g.:
456
457
<pre>
458
 $ radosgw --rgw-zone=us-west-1 --zonegroup=us-west --init-zone --url=http://us-west-1.example.com
459
</pre>
460
461
We can do the same for the zonegroup creation, so that every zone + zonegroup creation can be squashed to a single radosgw command.
462
463

464
465
*Work items*
466
467
*Coding tasks*
468
Task 1
469
Task 2
470
Task 3
471
472
*Build / release tasks*
473
Task 1
474
Task 2
475
Task 3
476
477
*Documentation tasks*
478
Task 1
479
Task 2
480
Task 3
481
482
*Deprecation tasks*
483
Task 1
484
Task 2
485
Task 3