Version 1 - History - 1D - RGW Geo-Replication and Disaster Recovery - Ceph - Ceph

1

Jessica Mack

h1. 1D - RGW Geo-Replication and Disaster Recovery

2

3

h3. Live Pad

4

5

The live pad can be found here: "[pad]":http://pad.ceph.com/p/RGW_Geo-Replication_and_Disaster_Recovery

6

7

h3. Summit Snapshot

8

9

Useful links:

10

11

p((.   http://wiki.ceph.com/01Planning/02Blueprints/Dumpling/RGW_Geo-Replication_and_Disaster_Recovery

12

13

Coding tasks

14

# enabling features (within the gateway)

15

** create regions

16

** new replication/placement attributes on buckets and objects

17

** additional logging of replication enabling information

18

** implement the log access/management APIs

19

** test suite for new log access/management APIs

20

** COPY across regions

21

# replication agent (free-standing application)

22

** track logs to identify changes

23

** propagate changes to secondary sites

24

** truncate no-longer-interesting logs

25

** test suite for update detection and propagation

26

# replication management APIs and console (free-standing application)

27

** definition of regions and zones

28

** management of bucket replication attributes

29

** enabling, disabling, monitoring of replication agent

30

** monitoring and reporting of replication status

31

32

33

Documentation tasks

34

# Document and review new RESTful APIs to access and manage the logs

35

# Documentation of the relevant log entries

36

# Document and review replication agent management interfaces

37

38

39

40

Questions for Yehuda:

41

42

p((.    Will there be support for something akin to Swift Container Synchronization, or

43

    Will Container Sync be supported directly for the Swift API? (see also http://goo.gl/4IZoi)

44

    Would you have to setup replication on a per-bucket basis or would it be possible to sync all buckets to a second cluster for DR (and swap masters) with a single config knob?

45

    Yehuda's reply: per-bucket/per-container granularity is part of the design, but the initial implementation will come with per-region granularity

46

47

48

p((.    Is there a plan to eventually refactor this in the context of a RADOS-level replication API (such that async replication could conceivably be extended to RBD, CephFS, and RADOS)?

49

    Yehuda's/Sage's/Greg's reply: this would involve a major overhaul of RADOS internals -- "don't hold your breath"

50

51

p((.    As an extension of that, what is the expected user difficulty for reversing the direction of replication in general? (This is one of the major pain points in GlusterFS geo-rep)

52

    -> not too hard.  conflict resolution will be simpler than glusterfs because of the simplified object model; something like newest version wins will work

53

54

p((.    How does this recover from erroreneous switches and being active on multiple sites, does that require a full resync?

55

    -> depends on how long the logs are, how long things are disconnected.

56

57

p((.    What would be the transport used for replication? HTTP(S)?

58

    -> it will be REST-based, so HTTP or HTTPS

59

60

p((.    Is the size of the log limited, and what happens when it overflows? (Because it's not pulled frequently enough, perhaps a temporary sync outage, ...)

61

    -> not yet decided.  option of (or combination of) size limit, and controlled by agents who are pulling

62

    Similarly, is the log replicated, or only on one gateway - what happens with the replication if that one fails?

63

    -> log is in rados, so it's durable

64

65

p((.    Please keep geo-dispersion of erasure-encoded object in mind while designing this.  Geo-dispersion of erasure-encoded objects is seen as a big cost reduction <hskinner>

66

    -> pretty please what about normal replc. sync with erasure-encoded back up

67

68

p((.   Will it scale for large amounts of containers and objects?

69

    -> for many containers, we shard the set of updated buckets

70

    -> for large containers, the bottleneck will be the bucket index/log... which will eventually be solved to also address the large container problem (by sharding the bucket index)

Project

General

Profile

Ceph

1D - RGW Geo-Replication and Disaster Recovery » History » Version 1