Project

General

Profile

Rgw sync agent architecture » History » Version 1

Jessica Mack, 06/01/2015 09:09 PM

1 1 Jessica Mack
h1. Rgw sync agent architecture
2
3
RGW Data sync
4
Current scheme:
5
6
 
7
# full sync (per shard)
8
## list all buckets
9
## for each bucket in current shard
10
### read bucket marker
11
### sync each object
12
#### if failed, add to list to retry later (put in replica log later)
13
### when done with bucket instance, update replica log on destination zone
14
#### bucket name
15
#### bucket marker (from start of sync)
16
#### list of objects to retry
17
## when done with all buckets in a shard,
18
### update replica log on destination zone with
19
#### shard number
20
#### list of bucket instances to retry
21
# incremental sync
22
## look up data log shard in replica log to get buckets that need retrying, and current marker
23
## read data log shard to get changed buckets, starting at marker from replica log
24
## for each bucket
25
### look up objects that need retrying in the replica log
26
### read the bucket instance log for the bucket starting at marker from replica log
27
### sync object using the same method as full sync
28
### if syncing an object fails, add it to a list to retry
29
### update bucket instance replica log with last marker read and list of objects to retry
30
## once data log shard is done, update replica log for that shard with
31
### new marker
32
### list of bucket instances to retry
33
34
Suggested scheme:
35
36
# %{color:blue}sync initialialization%
37
## for each shard:
38
###  %{color:blue}read marker and buckets to retry for shard’s replica log%
39
####  %{color:blue}if no marker, check shard_id->buckets index%
40
#####  %{color:blue}if that doesn’t exist yet, list all buckets and create shard_id -> buckets index%
41
### for each bucket  %{color:blue}incurrent shard%
42
####  %{color:blue}if bucket doesn’t have entry in replica log:%
43
#####  %{color:blue}add to replica log, mark for full sync%
44
#### %{color:blue}if bucket exists in replica log, go to (4)(b)%
45
# incremental sync
46
## look up data log shard in replica log to get buckets that need retrying, and current marker
47
##  %{color:red}check type of sync for any buckets that need to be retried, and generate the list of objects based on that%
48
###  %{color:red}full -> list all objects in bucket%
49
###  %{color:red}incremental -> read data log as usual%
50
## for each bucket
51
###  %{color:red}list objects in bucket that need to be synced based on sync type, from full list, or data log%
52
### read the bucket instance log for the bucket starting at marker from replica log
53
### sync object using the same method as full sync
54
### if syncing an object fails, add it to a list to retry
55
### update bucket instance replica log with last marker read and list of objects to retry
56
## once data log shard is done, update replica log for that shard with
57
### new marker
58
### list of bucket instances to retry
59
 
60
May need to be careful about updating with an empty marker, if e.g. lots of objects were uploaded before data logging was enabled. Perhaps use ‘ ‘ as the marker in that case, since it’s before all markers the gateway will generate.