Version 1 - History - RGW Geo-Replication and Disaster Recovery - Ceph - Ceph

1

Jessica Mack

h1. RGW Geo-Replication and Disaster Recovery

2

3

h3. Summary

4

5

Currently all Ceph data replication is synchronous, which means that it must be performed over  high-speed/low latency links.  This makes WAN scale replication impractical.  There are at least two pressing reasons for wanting WAN scale replication:

6

7

1. Disaster Recovery

8

9

Regional disasters have the potential to destroy an entire facility, or take it off line for a prolonged period of time.  If data is to remain safe and availabile in the face of such events, that data must be replicated to another location.

10

11

2. Different Geographical Locations

12

13

Geographically distributed teams and companies are increasingly common.  There are price, performance, convenience, and availability reasons to try to serve each team from local file servers.  Work done by a team in one location is often shared with teams in other locations, who would like to be able to access that data from their local file servers.

14

15

This blueprint describes these features and their implementation.

16

17

h3. Owners

18

19

* Yehuda Sadeh (Inktank)

20

21

h3. Interested Parties

22

23

* Greg Farnum

24

* Sage Weil

25

* Loic Dachary

26

* Christophe Courtaut christophe.courtaut@gmail.com

27

* Florian Haas

28

* Daniele Stroppa (ZHAW)

29

30

h3. Current Status

31

32

This feature is currently slated for the Dumpling release and implementation is currently underway, but additional assistance (to improve the schedule, provide more functionality, and reduce schedule risk) is welcome.

33

34

h3. Detailed Description

35

36

In a geographically distributed object storage system, sites will be organized into _regions_, and _zones_.

37

* regions are large, distinct geographic areas.   A region is made up of multiple zones.

38

** a particular bucket is created and replicated only within a single region

39

** user metadata is replicated across all regions

40

* zones are geographically separated sites, sufficiently independent that they are unlikely to be affected by a single disaster.

41

** a bucket can be replicated to multiple zones within that region

42

** each bucket has (at any given time) a designated master-zone, from which that bucket can be written

43

** all other (backup zones) have read-only access to that bucket ... but the master zone for a bucket can be changed at any time.

44

** the master/backup designation applies to particular buckets.  A zone that is a backup for one set of buckets can be master for others.

45

46

The basic replication model is:

47

* master zones maintain  logs of both user-metadata and bucket data updates

48

* remote sites can use (new) RESTful APIs to get information about recent updates

49

* backup-zone replication agents will use these APIs to track changes in master-zones, pull the updated information, and replay those same changes locally.

50

51

This is mechanism provides _eventual consistency_.  Backup zones will eventually see all master zone updates, but there the delay between master-zone operations and backup-zone replay means that clients in the backup-zones will sometimes see old data.  But there are many benefits for asynchronous, eventual-consistency, pull replication:

52

* it is highly robust in the face of link and site failures

53

* it does not force master-zone updates to wait for backup-zones to acknowledge (or catch up with) changes

54

* it can support arbitrary numbers of replicas

55

* it can support the creation of new mirrors at any time (long after the original data creation)

56

* it can be done very efficiently (compressing out multiple updates to the same object)

57

* while there is a replication delay, it can easily be tuned to be anywhere from seconds to years

58

59

(Expanded technical description can be found in the "design proposal":http://www.spinics.net/lists/ceph-devel/msg11905.html originally circulated on ceph-devel)

60

61

h3. Work items

62

63

Project

General

Profile

Ceph

RGW Geo-Replication and Disaster Recovery » History » Version 1