Project

General

Profile

Rados cache pool (part 2) » History » Version 1

Jessica Mack, 06/22/2015 01:58 AM

1 1 Jessica Mack
h1. Rados cache pool (part 2)
2 1 Jessica Mack
3 1 Jessica Mack
h3. Summary
4 1 Jessica Mack
5 1 Jessica Mack
Balance of work to create a cache pool tier
6 1 Jessica Mack
7 1 Jessica Mack
h3. Owners
8 1 Jessica Mack
9 1 Jessica Mack
* Sage Weil (Inktank)
10 1 Jessica Mack
* Greg Farnum (Inktank)
11 1 Jessica Mack
12 1 Jessica Mack
h3. Interested Parties
13 1 Jessica Mack
14 1 Jessica Mack
* Mike Dawson (Cloudapt)
15 1 Jessica Mack
* Yan, Zheng  (Intel)
16 1 Jessica Mack
* Jiangang, Duan  (Intel)
17 1 Jessica Mack
* Jian, Zhang (Intel)
18 1 Jessica Mack
19 1 Jessica Mack
h3. Current Status
20 1 Jessica Mack
21 1 Jessica Mack
About half to two-thirds of the work has been completed:
22 1 Jessica Mack
* copy-get and copy-from rados primitives
23 1 Jessica Mack
* objecter cache redirect logic (first read from cache tier, then from base pool)
24 1 Jessica Mack
* promote on read
25 1 Jessica Mack
26 1 Jessica Mack
Much of the logic is written but not yet merged:
27 1 Jessica Mack
* dirty, whiteout metadata
28 1 Jessica Mack
* flush
29 1 Jessica Mack
* evict
30 1 Jessica Mack
* HitSet bloom filter (or explicit enumeration) tracking of ios
31 1 Jessica Mack
32 1 Jessica Mack
Balance of effort:
33 1 Jessica Mack
* hitset expiration
34 1 Jessica Mack
* recover hitset when pg is recovered/migrated/whatever.
35 1 Jessica Mack
* [optional] preserve in-memory hitset across peering intervals
36 1 Jessica Mack
* stress tests that specifically exercise and validate dirty, whiteout, evict, flush, hitsets
37 1 Jessica Mack
* policy metadata for when to flush/evict from cache
38 1 Jessica Mack
* agent process/thread/whatever that evicts from cache when it approaches the high water mark 
39 1 Jessica Mack
40 1 Jessica Mack
h3. Detailed Description
41 1 Jessica Mack
42 1 Jessica Mack
hitset expiration
43 1 Jessica Mack
* osd logic to delete old hitsets (and replicate that deletion) once they are old or reach the max count.  or the pool max values are adjusted.
44 1 Jessica Mack
45 1 Jessica Mack
policy metadata for flush/evict from cache
46 1 Jessica Mack
* add pg_pool_t properties to control when we should
47 1 Jessica Mack
** flush dirty metadata,
48 1 Jessica Mack
** evicting old items because the pool is getting full
49 1 Jessica Mack
** evict any item because it is older than X
50 1 Jessica Mack
51 1 Jessica Mack
cache agent
52 1 Jessica Mack
* this might be a thread, or a python client, or a separate daemon.  discuss.
53 1 Jessica Mack
* periodically check pool metadata (stats) vs policy
54 1 Jessica Mack
* start at random point in pool and iterate over objects
55 1 Jessica Mack
** pull hitset history for current position
56 1 Jessica Mack
** estimate idle time for each object
57 1 Jessica Mack
** if they are meet some criteria, flush or evict
58 1 Jessica Mack
** move to next object; pull new hitset metadata as needed
59 1 Jessica Mack
* include some mechanism to throttle
60 1 Jessica Mack
61 1 Jessica Mack
cachemode_invalidate_forward
62 1 Jessica Mack
* implement policy
63 1 Jessica Mack
* build a test that adds a cache, populates it, drains it, and disables the cache
64 1 Jessica Mack
** add tests to the suite that do this in parallel with a running workload?
65 1 Jessica Mack
66 1 Jessica Mack
stress tests
67 1 Jessica Mack
* extend rados model to simply exercise flush and evict
68 1 Jessica Mack
* some sort of test to stress the hitset tracking code
69 1 Jessica Mack
* stress workload that promote new data and force eviction of old data (i.e. degenerate streaming workload)
70 1 Jessica Mack
* expand qa suite with cache pool tests
71 1 Jessica Mack
** explicit stress tests (above)
72 1 Jessica Mack
** enable/populate/drain/disable cache pool (and loop) in parallel with other workloads
73 1 Jessica Mack
74 1 Jessica Mack
h3. Work items
75 1 Jessica Mack
76 1 Jessica Mack
h4. Coding tasks
77 1 Jessica Mack
78 1 Jessica Mack
# hitset expiration
79 1 Jessica Mack
# policy metadata
80 1 Jessica Mack
# cache agent
81 1 Jessica Mack
# stress tests
82 1 Jessica Mack
83 1 Jessica Mack
h4. Documentation tasks
84 1 Jessica Mack
85 1 Jessica Mack
# document tiering framework
86 1 Jessica Mack
# document cache configuration, usage
87 1 Jessica Mack
## include limitations (e.g., PGLS results not cache coherent)