1D - RGW Geo-Replication and Disaster Recovery

Live Pad

The live pad can be found here: [pad]

Summit Snapshot

Useful links:

Coding tasks
  1. enabling features (within the gateway)
    • create regions
    • new replication/placement attributes on buckets and objects
    • additional logging of replication enabling information
    • implement the log access/management APIs
    • test suite for new log access/management APIs
    • COPY across regions
  2. replication agent (free-standing application)
    • track logs to identify changes
    • propagate changes to secondary sites
    • truncate no-longer-interesting logs
    • test suite for update detection and propagation
  3. replication management APIs and console (free-standing application)
    • definition of regions and zones
    • management of bucket replication attributes
    • enabling, disabling, monitoring of replication agent
    • monitoring and reporting of replication status
Documentation tasks
  1. Document and review new RESTful APIs to access and manage the logs
  2. Documentation of the relevant log entries
  3. Document and review replication agent management interfaces

Questions for Yehuda:

Will there be support for something akin to Swift Container Synchronization, or
Will Container Sync be supported directly for the Swift API? (see also
Would you have to setup replication on a per-bucket basis or would it be possible to sync all buckets to a second cluster for DR (and swap masters) with a single config knob?
Yehuda's reply: per-bucket/per-container granularity is part of the design, but the initial implementation will come with per-region granularity

Is there a plan to eventually refactor this in the context of a RADOS-level replication API (such that async replication could conceivably be extended to RBD, CephFS, and RADOS)?
Yehuda's/Sage's/Greg's reply: this would involve a major overhaul of RADOS internals -- "don't hold your breath"

As an extension of that, what is the expected user difficulty for reversing the direction of replication in general? (This is one of the major pain points in GlusterFS geo-rep)
-> not too hard. conflict resolution will be simpler than glusterfs because of the simplified object model; something like newest version wins will work

How does this recover from erroreneous switches and being active on multiple sites, does that require a full resync?
-> depends on how long the logs are, how long things are disconnected.

What would be the transport used for replication? HTTP?
-> it will be REST-based, so HTTP or HTTPS

Is the size of the log limited, and what happens when it overflows? (Because it's not pulled frequently enough, perhaps a temporary sync outage, ...)
-> not yet decided. option of (or combination of) size limit, and controlled by agents who are pulling
Similarly, is the log replicated, or only on one gateway - what happens with the replication if that one fails?
-> log is in rados, so it's durable

Please keep geo-dispersion of erasure-encoded object in mind while designing this. Geo-dispersion of erasure-encoded objects is seen as a big cost reduction <hskinner>
-> pretty please what about normal replc. sync with erasure-encoded back up

Will it scale for large amounts of containers and objects?
-> for many containers, we shard the set of updated buckets
-> for large containers, the bottleneck will be the bucket index/log... which will eventually be solved to also address the large container problem (by sharding the bucket index)