Project

General

Profile

Cache tier improvements - hitsets proxy write » History » Version 1

Jessica Mack, 07/06/2015 09:51 PM

1 1 Jessica Mack
h1. Cache tier improvements - hitsets proxy write
2
3
h3. Summary
4
5
e should extend the existing HitSet concept to capture more than just "1 or more reads or writes during this N hour period".
6
We should also merge the proxy-write patches to round out our flexibility for when to promote on the write side as well. 
7
8
h3. Owners
9
10
* Name (Affiliation)
11
* Name (Affiliation)
12
* Name
13
14
h3. Interested Parties
15
16
* Sage Weil (Affiliation)
17
* Name (Affiliation)
18
* Name
19
20
h3. Current Status
21
22
Hammer will now proxy reads, which makes our promotion decisions much more flexible.  Writes, however, still force a promotion, which prevents us from doing very much to avoid thrashing the cache.
23
HitSets cover a configurable time period (usually some number of hours) and we insert on any read or write.  They tell us if there were 1 or more of those items.  We cannot distinguish between 1 read and 10,000 reads.  Or reads and writes.  Or sequential vs random.
24
25
h3. Detailed Description
26
27
# Merge the write-proxy patches from Zhiqiang Wang
28
# Duplicate the current "recency" logic we use for read for writes.
29
# Track more/better metadata about our workload
30
## how many reads?  1, 10, 100?  how can we tell?
31
## reads vs writes?
32
33
How to actually do this?  Unclear.  We could use a counting HitSet, but that will only get us N bits of counting at a cost of N times as much memory/disk overhead.  Probably expensive.
34
We could use a hybrid structure: hash + bloom filter
35
* first insert things in a counting structure of bounded size
36
* as things fall off, insert them in the bloom filter
37
* > hot things get counts, cold things do not.
38
* maybe we record the average count for things that fall off too?
39
40
We can also cosntruct an efficient MRC (miss ratio curve) to estimate what our reuse-distance is for the workload.  This will let us decide how big such a hybrid structure should be.  And/or, how big our cache should be.  See: https://www.usenix.org/system/files/conference/fast15/fast15-paper-waldspurger.pdf
41
42
h3. Work items
43
44
h4. Coding tasks
45
46
# test and merge write-proxy series
47
# ??
48
# Profit