Project

General

Profile

RGW Geo-Replication and Disaster Recovery » History » Version 1

Jessica Mack, 06/09/2015 07:32 AM

1 1 Jessica Mack
h1. RGW Geo-Replication and Disaster Recovery
2
3
h3. Summary
4
5
Currently all Ceph data replication is synchronous, which means that it must be performed over  high-speed/low latency links.  This makes WAN scale replication impractical.  There are at least two pressing reasons for wanting WAN scale replication:
6
 
7
1. Disaster Recovery
8
 
9
Regional disasters have the potential to destroy an entire facility, or take it off line for a prolonged period of time.  If data is to remain safe and availabile in the face of such events, that data must be replicated to another location.
10
 
11
2. Different Geographical Locations
12
 
13
Geographically distributed teams and companies are increasingly common.  There are price, performance, convenience, and availability reasons to try to serve each team from local file servers.  Work done by a team in one location is often shared with teams in other locations, who would like to be able to access that data from their local file servers.
14
 
15
This blueprint describes these features and their implementation.
16
17
h3. Owners
18
19
* Yehuda Sadeh (Inktank)
20
21
h3. Interested Parties
22
23
* Greg Farnum
24
* Sage Weil
25
* Loic Dachary
26
* Christophe Courtaut christophe.courtaut@gmail.com
27
* Florian Haas
28
* Daniele Stroppa (ZHAW)
29
30
h3. Current Status
31
32
This feature is currently slated for the Dumpling release and implementation is currently underway, but additional assistance (to improve the schedule, provide more functionality, and reduce schedule risk) is welcome.
33
34
h3. Detailed Description
35
 
36
In a geographically distributed object storage system, sites will be organized into _regions_, and _zones_.  
37
* regions are large, distinct geographic areas.   A region is made up of multiple zones.
38
** a particular bucket is created and replicated only within a single region
39
** user metadata is replicated across all regions
40
* zones are geographically separated sites, sufficiently independent that they are unlikely to be affected by a single disaster.
41
** a bucket can be replicated to multiple zones within that region
42
** each bucket has (at any given time) a designated master-zone, from which that bucket can be written
43
** all other (backup zones) have read-only access to that bucket ... but the master zone for a bucket can be changed at any time.
44
** the master/backup designation applies to particular buckets.  A zone that is a backup for one set of buckets can be master for others.
45
46
The basic replication model is:
47
* master zones maintain  logs of both user-metadata and bucket data updates
48
* remote sites can use (new) RESTful APIs to get information about recent updates
49
* backup-zone replication agents will use these APIs to track changes in master-zones, pull the updated information, and replay those same changes locally.
50
51
This is mechanism provides _eventual consistency_.  Backup zones will eventually see all master zone updates, but there the delay between master-zone operations and backup-zone replay means that clients in the backup-zones will sometimes see old data.  But there are many benefits for asynchronous, eventual-consistency, pull replication:
52
* it is highly robust in the face of link and site failures
53
* it does not force master-zone updates to wait for backup-zones to acknowledge (or catch up with) changes
54
* it can support arbitrary numbers of replicas
55
* it can support the creation of new mirrors at any time (long after the original data creation)
56
* it can be done very efficiently (compressing out multiple updates to the same object)
57
* while there is a replication delay, it can easily be tuned to be anywhere from seconds to years
58
 
59
(Expanded technical description can be found in the "design proposal":http://www.spinics.net/lists/ceph-devel/msg11905.html originally circulated on ceph-devel)
60
61
h3. Work items
62
 
63