Version 1 - History - RBD - Mirroring - Ceph - Ceph

1

Jessica Mack

h1. RBD - Mirroring

2

3

h3. Summary

4

5

Create a metadata or data journal for RBD for async replication/DR purposes.

6

7

h3. Owners

8

9

* Josh Durgin (Red Hat)

10

* Name (Affiliation)

11

* Name

12

13

h3. Interested Parties

14

15

* Sage Weil (Inktank)

16

* Haomai Wang (UnitedStack)

17

18

h3. Current Status

19

20

You can create point-in-time snapshots of an entire image, and get a delta between two snapshots, but that delta requires a scan of all image objects to generate.  This limits its usefulness for DR purposes since it is generally not practical to pay for that scan at frequent intervals.

21

22

h3. Detailed Description

23

24

There are a few use cases to capture here:

25

# full data journal

26

## Allowing a replica image in another DC or cluster to stream updates in realtime.  Because the data journal would be time-ordered, the replica would also be a fully coherent point-in-time snapshot.

27

## Stream updates to a replica with a time delay.  This would be useful for coping with operator error or failures above the block layer by providing access to data that was recently overwritten or deleted.

28

## [optional] On journal roll-forward/apply, generate a reverse-direction rollback journal so that the replica image could also be scrubbed backwards in time.  (This requires a read before write and is like impractical on the source/master image, but may be practical on the slave/backup image.)

29

# metadata journal

30

## Accelerate the current 'delta' API that drives the incremental diffs so that it costs O(number of writes) instead of O(number of objects)

31

32

The basic design:

33

* associate a journal with an RBD image.

34

** each journal entry represents an IO operation

35

** include a timestamp and any other potentially useful metadata

36

** stripe the journal over objects using something similar to Journaler

37

** [optional] allow the journal to live in a different pool (e.g., one that is flash-backed)

38

* if the image writer understands the feature and it is enabled,

39

** apply every write first to the journal, then to the device

40

** acknowledge the write as committed either

41

** after journal commit (default)

42

** after journal and base image (in case the journal is, say, stored in a less-durable but higher-performing pool)

43

** [optional] before applying a journaled write, copy the data we are about to overwrite to a second rollback journal

44

** on open, replay recent journal operations

45

** periodically update a journal position pointer in the rbd image header (to limit replays on open)

46

* on read, check the in-memory cache of in-flight (journaling but uncommitted) writes to preserve basic read/write consistency

47

** (in reality this should be very rare, since no sane block user would read from a block for which a write is currently in flight)

48

* create a 'slave' function that watches the tail of the journal

49

** when there is a remote write, apply it locally

50

*** depending on the local image properties, this may/may not get journaled locally.  leave that to the user

51

** [optional] add a time delay (e.g., 1 hour) between the journaled write and applying it locally

52

** [optional] update the source image with metadata about our replication state. the master may want to control trimming based on our progress instead of using a simple time delay.

53

* periodically trim the journal based on time and/or size

54

55

TODO:

56

* settle on initial functionality

57

* define user interface (librbd, rbd CLI) to fully capture user stories

58

* translate to dev tasks

59

60

h3. Work items

61

62

h4. Coding tasks

63

64

# Task 1

65

# Task 2

66

# Task 3

67

68

h4. Build / release tasks

69

70

# Task 1

71

# Task 2

72

# Task 3

73

74

h4. Documentation tasks

75

76

# Task 1

77

# Task 2

78

# Task 3

79

80

h4. Deprecation tasks

81

82

# Task 1

83

# Task 2

84

# Task 3

Project

General

Profile

Ceph

RBD - Mirroring » History » Version 1