1D - RGW Geo-Replication and Disaster Recovery » History » Version 1
Jessica Mack, 06/22/2015 04:34 AM
1 | 1 | Jessica Mack | h1. 1D - RGW Geo-Replication and Disaster Recovery |
---|---|---|---|
2 | |||
3 | h3. Live Pad |
||
4 | |||
5 | The live pad can be found here: "[pad]":http://pad.ceph.com/p/RGW_Geo-Replication_and_Disaster_Recovery |
||
6 | |||
7 | h3. Summit Snapshot |
||
8 | |||
9 | Useful links: |
||
10 | |||
11 | p((. http://wiki.ceph.com/01Planning/02Blueprints/Dumpling/RGW_Geo-Replication_and_Disaster_Recovery |
||
12 | |||
13 | Coding tasks |
||
14 | # enabling features (within the gateway) |
||
15 | ** create regions |
||
16 | ** new replication/placement attributes on buckets and objects |
||
17 | ** additional logging of replication enabling information |
||
18 | ** implement the log access/management APIs |
||
19 | ** test suite for new log access/management APIs |
||
20 | ** COPY across regions |
||
21 | # replication agent (free-standing application) |
||
22 | ** track logs to identify changes |
||
23 | ** propagate changes to secondary sites |
||
24 | ** truncate no-longer-interesting logs |
||
25 | ** test suite for update detection and propagation |
||
26 | # replication management APIs and console (free-standing application) |
||
27 | ** definition of regions and zones |
||
28 | ** management of bucket replication attributes |
||
29 | ** enabling, disabling, monitoring of replication agent |
||
30 | ** monitoring and reporting of replication status |
||
31 | |||
32 | |||
33 | Documentation tasks |
||
34 | # Document and review new RESTful APIs to access and manage the logs |
||
35 | # Documentation of the relevant log entries |
||
36 | # Document and review replication agent management interfaces |
||
37 | |||
38 | |||
39 | |||
40 | Questions for Yehuda: |
||
41 | |||
42 | p((. Will there be support for something akin to Swift Container Synchronization, or |
||
43 | Will Container Sync be supported directly for the Swift API? (see also http://goo.gl/4IZoi) |
||
44 | Would you have to setup replication on a per-bucket basis or would it be possible to sync all buckets to a second cluster for DR (and swap masters) with a single config knob? |
||
45 | Yehuda's reply: per-bucket/per-container granularity is part of the design, but the initial implementation will come with per-region granularity |
||
46 | |||
47 | |||
48 | p((. Is there a plan to eventually refactor this in the context of a RADOS-level replication API (such that async replication could conceivably be extended to RBD, CephFS, and RADOS)? |
||
49 | Yehuda's/Sage's/Greg's reply: this would involve a major overhaul of RADOS internals -- "don't hold your breath" |
||
50 | |||
51 | p((. As an extension of that, what is the expected user difficulty for reversing the direction of replication in general? (This is one of the major pain points in GlusterFS geo-rep) |
||
52 | -> not too hard. conflict resolution will be simpler than glusterfs because of the simplified object model; something like newest version wins will work |
||
53 | |||
54 | p((. How does this recover from erroreneous switches and being active on multiple sites, does that require a full resync? |
||
55 | -> depends on how long the logs are, how long things are disconnected. |
||
56 | |||
57 | p((. What would be the transport used for replication? HTTP(S)? |
||
58 | -> it will be REST-based, so HTTP or HTTPS |
||
59 | |||
60 | p((. Is the size of the log limited, and what happens when it overflows? (Because it's not pulled frequently enough, perhaps a temporary sync outage, ...) |
||
61 | -> not yet decided. option of (or combination of) size limit, and controlled by agents who are pulling |
||
62 | Similarly, is the log replicated, or only on one gateway - what happens with the replication if that one fails? |
||
63 | -> log is in rados, so it's durable |
||
64 | |||
65 | p((. Please keep geo-dispersion of erasure-encoded object in mind while designing this. Geo-dispersion of erasure-encoded objects is seen as a big cost reduction <hskinner> |
||
66 | -> pretty please what about normal replc. sync with erasure-encoded back up |
||
67 | |||
68 | p((. Will it scale for large amounts of containers and objects? |
||
69 | -> for many containers, we shard the set of updated buckets |
||
70 | -> for large containers, the bottleneck will be the bucket index/log... which will eventually be solved to also address the large container problem (by sharding the bucket index) |