Rados cache pool (part 2) » History » Version 1
Jessica Mack, 06/22/2015 01:58 AM
1 | 1 | Jessica Mack | h1. Rados cache pool (part 2) |
---|---|---|---|
2 | 1 | Jessica Mack | |
3 | 1 | Jessica Mack | h3. Summary |
4 | 1 | Jessica Mack | |
5 | 1 | Jessica Mack | Balance of work to create a cache pool tier |
6 | 1 | Jessica Mack | |
7 | 1 | Jessica Mack | h3. Owners |
8 | 1 | Jessica Mack | |
9 | 1 | Jessica Mack | * Sage Weil (Inktank) |
10 | 1 | Jessica Mack | * Greg Farnum (Inktank) |
11 | 1 | Jessica Mack | |
12 | 1 | Jessica Mack | h3. Interested Parties |
13 | 1 | Jessica Mack | |
14 | 1 | Jessica Mack | * Mike Dawson (Cloudapt) |
15 | 1 | Jessica Mack | * Yan, Zheng (Intel) |
16 | 1 | Jessica Mack | * Jiangang, Duan (Intel) |
17 | 1 | Jessica Mack | * Jian, Zhang (Intel) |
18 | 1 | Jessica Mack | |
19 | 1 | Jessica Mack | h3. Current Status |
20 | 1 | Jessica Mack | |
21 | 1 | Jessica Mack | About half to two-thirds of the work has been completed: |
22 | 1 | Jessica Mack | * copy-get and copy-from rados primitives |
23 | 1 | Jessica Mack | * objecter cache redirect logic (first read from cache tier, then from base pool) |
24 | 1 | Jessica Mack | * promote on read |
25 | 1 | Jessica Mack | |
26 | 1 | Jessica Mack | Much of the logic is written but not yet merged: |
27 | 1 | Jessica Mack | * dirty, whiteout metadata |
28 | 1 | Jessica Mack | * flush |
29 | 1 | Jessica Mack | * evict |
30 | 1 | Jessica Mack | * HitSet bloom filter (or explicit enumeration) tracking of ios |
31 | 1 | Jessica Mack | |
32 | 1 | Jessica Mack | Balance of effort: |
33 | 1 | Jessica Mack | * hitset expiration |
34 | 1 | Jessica Mack | * recover hitset when pg is recovered/migrated/whatever. |
35 | 1 | Jessica Mack | * [optional] preserve in-memory hitset across peering intervals |
36 | 1 | Jessica Mack | * stress tests that specifically exercise and validate dirty, whiteout, evict, flush, hitsets |
37 | 1 | Jessica Mack | * policy metadata for when to flush/evict from cache |
38 | 1 | Jessica Mack | * agent process/thread/whatever that evicts from cache when it approaches the high water mark |
39 | 1 | Jessica Mack | |
40 | 1 | Jessica Mack | h3. Detailed Description |
41 | 1 | Jessica Mack | |
42 | 1 | Jessica Mack | hitset expiration |
43 | 1 | Jessica Mack | * osd logic to delete old hitsets (and replicate that deletion) once they are old or reach the max count. or the pool max values are adjusted. |
44 | 1 | Jessica Mack | |
45 | 1 | Jessica Mack | policy metadata for flush/evict from cache |
46 | 1 | Jessica Mack | * add pg_pool_t properties to control when we should |
47 | 1 | Jessica Mack | ** flush dirty metadata, |
48 | 1 | Jessica Mack | ** evicting old items because the pool is getting full |
49 | 1 | Jessica Mack | ** evict any item because it is older than X |
50 | 1 | Jessica Mack | |
51 | 1 | Jessica Mack | cache agent |
52 | 1 | Jessica Mack | * this might be a thread, or a python client, or a separate daemon. discuss. |
53 | 1 | Jessica Mack | * periodically check pool metadata (stats) vs policy |
54 | 1 | Jessica Mack | * start at random point in pool and iterate over objects |
55 | 1 | Jessica Mack | ** pull hitset history for current position |
56 | 1 | Jessica Mack | ** estimate idle time for each object |
57 | 1 | Jessica Mack | ** if they are meet some criteria, flush or evict |
58 | 1 | Jessica Mack | ** move to next object; pull new hitset metadata as needed |
59 | 1 | Jessica Mack | * include some mechanism to throttle |
60 | 1 | Jessica Mack | |
61 | 1 | Jessica Mack | cachemode_invalidate_forward |
62 | 1 | Jessica Mack | * implement policy |
63 | 1 | Jessica Mack | * build a test that adds a cache, populates it, drains it, and disables the cache |
64 | 1 | Jessica Mack | ** add tests to the suite that do this in parallel with a running workload? |
65 | 1 | Jessica Mack | |
66 | 1 | Jessica Mack | stress tests |
67 | 1 | Jessica Mack | * extend rados model to simply exercise flush and evict |
68 | 1 | Jessica Mack | * some sort of test to stress the hitset tracking code |
69 | 1 | Jessica Mack | * stress workload that promote new data and force eviction of old data (i.e. degenerate streaming workload) |
70 | 1 | Jessica Mack | * expand qa suite with cache pool tests |
71 | 1 | Jessica Mack | ** explicit stress tests (above) |
72 | 1 | Jessica Mack | ** enable/populate/drain/disable cache pool (and loop) in parallel with other workloads |
73 | 1 | Jessica Mack | |
74 | 1 | Jessica Mack | h3. Work items |
75 | 1 | Jessica Mack | |
76 | 1 | Jessica Mack | h4. Coding tasks |
77 | 1 | Jessica Mack | |
78 | 1 | Jessica Mack | # hitset expiration |
79 | 1 | Jessica Mack | # policy metadata |
80 | 1 | Jessica Mack | # cache agent |
81 | 1 | Jessica Mack | # stress tests |
82 | 1 | Jessica Mack | |
83 | 1 | Jessica Mack | h4. Documentation tasks |
84 | 1 | Jessica Mack | |
85 | 1 | Jessica Mack | # document tiering framework |
86 | 1 | Jessica Mack | # document cache configuration, usage |
87 | 1 | Jessica Mack | ## include limitations (e.g., PGLS results not cache coherent) |