RBD - Mirroring » History » Version 1
Jessica Mack, 07/03/2015 09:36 PM
1 | 1 | Jessica Mack | h1. RBD - Mirroring |
---|---|---|---|
2 | |||
3 | h3. Summary |
||
4 | |||
5 | Create a metadata or data journal for RBD for async replication/DR purposes. |
||
6 | |||
7 | h3. Owners |
||
8 | |||
9 | * Josh Durgin (Red Hat) |
||
10 | * Name (Affiliation) |
||
11 | * Name |
||
12 | |||
13 | h3. Interested Parties |
||
14 | |||
15 | * Sage Weil (Inktank) |
||
16 | * Haomai Wang (UnitedStack) |
||
17 | |||
18 | h3. Current Status |
||
19 | |||
20 | You can create point-in-time snapshots of an entire image, and get a delta between two snapshots, but that delta requires a scan of all image objects to generate. This limits its usefulness for DR purposes since it is generally not practical to pay for that scan at frequent intervals. |
||
21 | |||
22 | h3. Detailed Description |
||
23 | |||
24 | There are a few use cases to capture here: |
||
25 | # full data journal |
||
26 | ## Allowing a replica image in another DC or cluster to stream updates in realtime. Because the data journal would be time-ordered, the replica would also be a fully coherent point-in-time snapshot. |
||
27 | ## Stream updates to a replica with a time delay. This would be useful for coping with operator error or failures above the block layer by providing access to data that was recently overwritten or deleted. |
||
28 | ## [optional] On journal roll-forward/apply, generate a reverse-direction rollback journal so that the replica image could also be scrubbed backwards in time. (This requires a read before write and is like impractical on the source/master image, but may be practical on the slave/backup image.) |
||
29 | # metadata journal |
||
30 | ## Accelerate the current 'delta' API that drives the incremental diffs so that it costs O(number of writes) instead of O(number of objects) |
||
31 | |||
32 | The basic design: |
||
33 | * associate a journal with an RBD image. |
||
34 | ** each journal entry represents an IO operation |
||
35 | ** include a timestamp and any other potentially useful metadata |
||
36 | ** stripe the journal over objects using something similar to Journaler |
||
37 | ** [optional] allow the journal to live in a different pool (e.g., one that is flash-backed) |
||
38 | * if the image writer understands the feature and it is enabled, |
||
39 | ** apply every write first to the journal, then to the device |
||
40 | ** acknowledge the write as committed either |
||
41 | ** after journal commit (default) |
||
42 | ** after journal and base image (in case the journal is, say, stored in a less-durable but higher-performing pool) |
||
43 | ** [optional] before applying a journaled write, copy the data we are about to overwrite to a second rollback journal |
||
44 | ** on open, replay recent journal operations |
||
45 | ** periodically update a journal position pointer in the rbd image header (to limit replays on open) |
||
46 | * on read, check the in-memory cache of in-flight (journaling but uncommitted) writes to preserve basic read/write consistency |
||
47 | ** (in reality this should be very rare, since no sane block user would read from a block for which a write is currently in flight) |
||
48 | * create a 'slave' function that watches the tail of the journal |
||
49 | ** when there is a remote write, apply it locally |
||
50 | *** depending on the local image properties, this may/may not get journaled locally. leave that to the user |
||
51 | ** [optional] add a time delay (e.g., 1 hour) between the journaled write and applying it locally |
||
52 | ** [optional] update the source image with metadata about our replication state. the master may want to control trimming based on our progress instead of using a simple time delay. |
||
53 | * periodically trim the journal based on time and/or size |
||
54 | |||
55 | TODO: |
||
56 | * settle on initial functionality |
||
57 | * define user interface (librbd, rbd CLI) to fully capture user stories |
||
58 | * translate to dev tasks |
||
59 | |||
60 | h3. Work items |
||
61 | |||
62 | h4. Coding tasks |
||
63 | |||
64 | # Task 1 |
||
65 | # Task 2 |
||
66 | # Task 3 |
||
67 | |||
68 | h4. Build / release tasks |
||
69 | |||
70 | # Task 1 |
||
71 | # Task 2 |
||
72 | # Task 3 |
||
73 | |||
74 | h4. Documentation tasks |
||
75 | |||
76 | # Task 1 |
||
77 | # Task 2 |
||
78 | # Task 3 |
||
79 | |||
80 | h4. Deprecation tasks |
||
81 | |||
82 | # Task 1 |
||
83 | # Task 2 |
||
84 | # Task 3 |