Rados cache pool (part 2) » History » Version 1
Jessica Mack, 06/22/2015 01:58 AM
1 | 1 | Jessica Mack | h1. Rados cache pool (part 2) |
---|---|---|---|
2 | |||
3 | h3. Summary |
||
4 | |||
5 | Balance of work to create a cache pool tier |
||
6 | |||
7 | h3. Owners |
||
8 | |||
9 | * Sage Weil (Inktank) |
||
10 | * Greg Farnum (Inktank) |
||
11 | |||
12 | h3. Interested Parties |
||
13 | |||
14 | * Mike Dawson (Cloudapt) |
||
15 | * Yan, Zheng (Intel) |
||
16 | * Jiangang, Duan (Intel) |
||
17 | * Jian, Zhang (Intel) |
||
18 | |||
19 | h3. Current Status |
||
20 | |||
21 | About half to two-thirds of the work has been completed: |
||
22 | * copy-get and copy-from rados primitives |
||
23 | * objecter cache redirect logic (first read from cache tier, then from base pool) |
||
24 | * promote on read |
||
25 | |||
26 | Much of the logic is written but not yet merged: |
||
27 | * dirty, whiteout metadata |
||
28 | * flush |
||
29 | * evict |
||
30 | * HitSet bloom filter (or explicit enumeration) tracking of ios |
||
31 | |||
32 | Balance of effort: |
||
33 | * hitset expiration |
||
34 | * recover hitset when pg is recovered/migrated/whatever. |
||
35 | * [optional] preserve in-memory hitset across peering intervals |
||
36 | * stress tests that specifically exercise and validate dirty, whiteout, evict, flush, hitsets |
||
37 | * policy metadata for when to flush/evict from cache |
||
38 | * agent process/thread/whatever that evicts from cache when it approaches the high water mark |
||
39 | |||
40 | h3. Detailed Description |
||
41 | |||
42 | hitset expiration |
||
43 | * osd logic to delete old hitsets (and replicate that deletion) once they are old or reach the max count. or the pool max values are adjusted. |
||
44 | |||
45 | policy metadata for flush/evict from cache |
||
46 | * add pg_pool_t properties to control when we should |
||
47 | ** flush dirty metadata, |
||
48 | ** evicting old items because the pool is getting full |
||
49 | ** evict any item because it is older than X |
||
50 | |||
51 | cache agent |
||
52 | * this might be a thread, or a python client, or a separate daemon. discuss. |
||
53 | * periodically check pool metadata (stats) vs policy |
||
54 | * start at random point in pool and iterate over objects |
||
55 | ** pull hitset history for current position |
||
56 | ** estimate idle time for each object |
||
57 | ** if they are meet some criteria, flush or evict |
||
58 | ** move to next object; pull new hitset metadata as needed |
||
59 | * include some mechanism to throttle |
||
60 | |||
61 | cachemode_invalidate_forward |
||
62 | * implement policy |
||
63 | * build a test that adds a cache, populates it, drains it, and disables the cache |
||
64 | ** add tests to the suite that do this in parallel with a running workload? |
||
65 | |||
66 | stress tests |
||
67 | * extend rados model to simply exercise flush and evict |
||
68 | * some sort of test to stress the hitset tracking code |
||
69 | * stress workload that promote new data and force eviction of old data (i.e. degenerate streaming workload) |
||
70 | * expand qa suite with cache pool tests |
||
71 | ** explicit stress tests (above) |
||
72 | ** enable/populate/drain/disable cache pool (and loop) in parallel with other workloads |
||
73 | |||
74 | h3. Work items |
||
75 | |||
76 | h4. Coding tasks |
||
77 | |||
78 | # hitset expiration |
||
79 | # policy metadata |
||
80 | # cache agent |
||
81 | # stress tests |
||
82 | |||
83 | h4. Documentation tasks |
||
84 | |||
85 | # document tiering framework |
||
86 | # document cache configuration, usage |
||
87 | ## include limitations (e.g., PGLS results not cache coherent) |