Project

General

Profile

RBD - Mirroring » History » Version 1

Jessica Mack, 07/03/2015 09:36 PM

1 1 Jessica Mack
h1. RBD - Mirroring
2
3
h3. Summary
4
5
Create a metadata or data journal for RBD for async replication/DR purposes.
6
7
h3. Owners
8
9
* Josh Durgin (Red Hat)
10
* Name (Affiliation)
11
* Name
12
13
h3. Interested Parties
14
15
* Sage Weil (Inktank)
16
* Haomai Wang (UnitedStack)
17
18
h3. Current Status
19
20
You can create point-in-time snapshots of an entire image, and get a delta between two snapshots, but that delta requires a scan of all image objects to generate.  This limits its usefulness for DR purposes since it is generally not practical to pay for that scan at frequent intervals.
21
22
h3. Detailed Description
23
24
There are a few use cases to capture here:
25
# full data journal
26
## Allowing a replica image in another DC or cluster to stream updates in realtime.  Because the data journal would be time-ordered, the replica would also be a fully coherent point-in-time snapshot.
27
## Stream updates to a replica with a time delay.  This would be useful for coping with operator error or failures above the block layer by providing access to data that was recently overwritten or deleted.
28
## [optional] On journal roll-forward/apply, generate a reverse-direction rollback journal so that the replica image could also be scrubbed backwards in time.  (This requires a read before write and is like impractical on the source/master image, but may be practical on the slave/backup image.)
29
# metadata journal
30
## Accelerate the current 'delta' API that drives the incremental diffs so that it costs O(number of writes) instead of O(number of objects)
31
32
The basic design:
33
* associate a journal with an RBD image.
34
** each journal entry represents an IO operation
35
** include a timestamp and any other potentially useful metadata
36
** stripe the journal over objects using something similar to Journaler
37
** [optional] allow the journal to live in a different pool (e.g., one that is flash-backed) 
38
* if the image writer understands the feature and it is enabled,
39
** apply every write first to the journal, then to the device
40
** acknowledge the write as committed either
41
** after journal commit (default)
42
** after journal and base image (in case the journal is, say, stored in a less-durable but higher-performing pool)
43
** [optional] before applying a journaled write, copy the data we are about to overwrite to a second rollback journal 
44
** on open, replay recent journal operations
45
** periodically update a journal position pointer in the rbd image header (to limit replays on open)
46
* on read, check the in-memory cache of in-flight (journaling but uncommitted) writes to preserve basic read/write consistency
47
** (in reality this should be very rare, since no sane block user would read from a block for which a write is currently in flight)
48
* create a 'slave' function that watches the tail of the journal
49
** when there is a remote write, apply it locally
50
*** depending on the local image properties, this may/may not get journaled locally.  leave that to the user
51
** [optional] add a time delay (e.g., 1 hour) between the journaled write and applying it locally
52
** [optional] update the source image with metadata about our replication state. the master may want to control trimming based on our progress instead of using a simple time delay.
53
* periodically trim the journal based on time and/or size
54
 
55
TODO:
56
* settle on initial functionality
57
* define user interface (librbd, rbd CLI) to fully capture user stories
58
* translate to dev tasks
59
60
h3. Work items
61
62
h4. Coding tasks
63
64
# Task 1
65
# Task 2
66
# Task 3
67
68
h4. Build / release tasks
69
70
# Task 1
71
# Task 2
72
# Task 3
73
74
h4. Documentation tasks
75
76
# Task 1
77
# Task 2
78
# Task 3
79
80
h4. Deprecation tasks
81
82
# Task 1
83
# Task 2
84
# Task 3