Version 1 - History - Rgw sync agent architecture - Ceph - Ceph

1

Jessica Mack

h1. Rgw sync agent architecture

2

3

RGW Data sync

4

Current scheme:

5

6

7

# full sync (per shard)

8

## list all buckets

9

## for each bucket in current shard

10

### read bucket marker

11

### sync each object

12

#### if failed, add to list to retry later (put in replica log later)

13

### when done with bucket instance, update replica log on destination zone

14

#### bucket name

15

#### bucket marker (from start of sync)

16

#### list of objects to retry

17

## when done with all buckets in a shard,

18

### update replica log on destination zone with

19

#### shard number

20

#### list of bucket instances to retry

21

# incremental sync

22

## look up data log shard in replica log to get buckets that need retrying, and current marker

23

## read data log shard to get changed buckets, starting at marker from replica log

24

## for each bucket

25

### look up objects that need retrying in the replica log

26

### read the bucket instance log for the bucket starting at marker from replica log

27

### sync object using the same method as full sync

28

### if syncing an object fails, add it to a list to retry

29

### update bucket instance replica log with last marker read and list of objects to retry

30

## once data log shard is done, update replica log for that shard with

31

### new marker

32

### list of bucket instances to retry

33

34

Suggested scheme:

35

36

# %{color:blue}sync initialialization%

37

## for each shard:

38

###  %{color:blue}read marker and buckets to retry for shard’s replica log%

39

####  %{color:blue}if no marker, check shard_id->buckets index%

40

#####  %{color:blue}if that doesn’t exist yet, list all buckets and create shard_id -> buckets index%

41

### for each bucket  %{color:blue}incurrent shard%

42

####  %{color:blue}if bucket doesn’t have entry in replica log:%

43

#####  %{color:blue}add to replica log, mark for full sync%

44

#### %{color:blue}if bucket exists in replica log, go to (4)(b)%

45

# incremental sync

46

## look up data log shard in replica log to get buckets that need retrying, and current marker

47

##  %{color:red}check type of sync for any buckets that need to be retried, and generate the list of objects based on that%

48

###  %{color:red}full -> list all objects in bucket%

49

###  %{color:red}incremental -> read data log as usual%

50

## for each bucket

51

###  %{color:red}list objects in bucket that need to be synced based on sync type, from full list, or data log%

52

### read the bucket instance log for the bucket starting at marker from replica log

53

### sync object using the same method as full sync

54

### if syncing an object fails, add it to a list to retry

55

### update bucket instance replica log with last marker read and list of objects to retry

56

## once data log shard is done, update replica log for that shard with

57

### new marker

58

### list of bucket instances to retry

59

60

May need to be careful about updating with an empty marker, if e.g. lots of objects were uploaded before data logging was enabled. Perhaps use ‘ ‘ as the marker in that case, since it’s before all markers the gateway will generate.

Project

General

Profile

Ceph

Rgw sync agent architecture » History » Version 1