Version 1 - History - Cache tier improvements - hitsets proxy write - Ceph - Ceph

1

Jessica Mack

h1. Cache tier improvements - hitsets proxy write

2

3

h3. Summary

4

5

e should extend the existing HitSet concept to capture more than just "1 or more reads or writes during this N hour period".

6

We should also merge the proxy-write patches to round out our flexibility for when to promote on the write side as well.

7

8

h3. Owners

9

10

* Name (Affiliation)

11

* Name (Affiliation)

12

* Name

13

14

h3. Interested Parties

15

16

* Sage Weil (Affiliation)

17

* Name (Affiliation)

18

* Name

19

20

h3. Current Status

21

22

Hammer will now proxy reads, which makes our promotion decisions much more flexible.  Writes, however, still force a promotion, which prevents us from doing very much to avoid thrashing the cache.

23

HitSets cover a configurable time period (usually some number of hours) and we insert on any read or write.  They tell us if there were 1 or more of those items.  We cannot distinguish between 1 read and 10,000 reads.  Or reads and writes.  Or sequential vs random.

24

25

h3. Detailed Description

26

27

# Merge the write-proxy patches from Zhiqiang Wang

28

# Duplicate the current "recency" logic we use for read for writes.

29

# Track more/better metadata about our workload

30

## how many reads?  1, 10, 100?  how can we tell?

31

## reads vs writes?

32

33

How to actually do this?  Unclear.  We could use a counting HitSet, but that will only get us N bits of counting at a cost of N times as much memory/disk overhead.  Probably expensive.

34

We could use a hybrid structure: hash + bloom filter

35

* first insert things in a counting structure of bounded size

36

* as things fall off, insert them in the bloom filter

37

* > hot things get counts, cold things do not.

38

* maybe we record the average count for things that fall off too?

39

40

We can also cosntruct an efficient MRC (miss ratio curve) to estimate what our reuse-distance is for the workload.  This will let us decide how big such a hybrid structure should be.  And/or, how big our cache should be.  See: https://www.usenix.org/system/files/conference/fast15/fast15-paper-waldspurger.pdf

41

42

h3. Work items

43

44

h4. Coding tasks

45

46

# test and merge write-proxy series

47

# ??

48

# Profit

Project

General

Profile

Ceph

Cache tier improvements - hitsets proxy write » History » Version 1