Rgw sync agent architecture » History » Version 1
Jessica Mack, 06/01/2015 09:09 PM
1 | 1 | Jessica Mack | h1. Rgw sync agent architecture |
---|---|---|---|
2 | |||
3 | RGW Data sync |
||
4 | Current scheme: |
||
5 | |||
6 | |||
7 | # full sync (per shard) |
||
8 | ## list all buckets |
||
9 | ## for each bucket in current shard |
||
10 | ### read bucket marker |
||
11 | ### sync each object |
||
12 | #### if failed, add to list to retry later (put in replica log later) |
||
13 | ### when done with bucket instance, update replica log on destination zone |
||
14 | #### bucket name |
||
15 | #### bucket marker (from start of sync) |
||
16 | #### list of objects to retry |
||
17 | ## when done with all buckets in a shard, |
||
18 | ### update replica log on destination zone with |
||
19 | #### shard number |
||
20 | #### list of bucket instances to retry |
||
21 | # incremental sync |
||
22 | ## look up data log shard in replica log to get buckets that need retrying, and current marker |
||
23 | ## read data log shard to get changed buckets, starting at marker from replica log |
||
24 | ## for each bucket |
||
25 | ### look up objects that need retrying in the replica log |
||
26 | ### read the bucket instance log for the bucket starting at marker from replica log |
||
27 | ### sync object using the same method as full sync |
||
28 | ### if syncing an object fails, add it to a list to retry |
||
29 | ### update bucket instance replica log with last marker read and list of objects to retry |
||
30 | ## once data log shard is done, update replica log for that shard with |
||
31 | ### new marker |
||
32 | ### list of bucket instances to retry |
||
33 | |||
34 | Suggested scheme: |
||
35 | |||
36 | # %{color:blue}sync initialialization% |
||
37 | ## for each shard: |
||
38 | ### %{color:blue}read marker and buckets to retry for shard’s replica log% |
||
39 | #### %{color:blue}if no marker, check shard_id->buckets index% |
||
40 | ##### %{color:blue}if that doesn’t exist yet, list all buckets and create shard_id -> buckets index% |
||
41 | ### for each bucket %{color:blue}incurrent shard% |
||
42 | #### %{color:blue}if bucket doesn’t have entry in replica log:% |
||
43 | ##### %{color:blue}add to replica log, mark for full sync% |
||
44 | #### %{color:blue}if bucket exists in replica log, go to (4)(b)% |
||
45 | # incremental sync |
||
46 | ## look up data log shard in replica log to get buckets that need retrying, and current marker |
||
47 | ## %{color:red}check type of sync for any buckets that need to be retried, and generate the list of objects based on that% |
||
48 | ### %{color:red}full -> list all objects in bucket% |
||
49 | ### %{color:red}incremental -> read data log as usual% |
||
50 | ## for each bucket |
||
51 | ### %{color:red}list objects in bucket that need to be synced based on sync type, from full list, or data log% |
||
52 | ### read the bucket instance log for the bucket starting at marker from replica log |
||
53 | ### sync object using the same method as full sync |
||
54 | ### if syncing an object fails, add it to a list to retry |
||
55 | ### update bucket instance replica log with last marker read and list of objects to retry |
||
56 | ## once data log shard is done, update replica log for that shard with |
||
57 | ### new marker |
||
58 | ### list of bucket instances to retry |
||
59 | |||
60 | May need to be careful about updating with an empty marker, if e.g. lots of objects were uploaded before data logging was enabled. Perhaps use ‘ ‘ as the marker in that case, since it’s before all markers the gateway will generate. |