Cache Tiering - Improve efficiency of read-miss operations » History » Version 1
Narendra Narang, 06/11/2015 01:42 AM
1 | 1 | Narendra Narang | h1. Cache Tiering - Improve efficiency of read-miss operations |
---|---|---|---|
2 | 1 | Narendra Narang | |
3 | 1 | Narendra Narang | Summary |
4 | 1 | Narendra Narang | Suggested changes to the way read-misses are fulfilled from the cache tier to improve efficiencies. |
5 | 1 | Narendra Narang | |
6 | 1 | Narendra Narang | Owners |
7 | 1 | Narendra Narang | Narendra Narang (Red Hat) |
8 | 1 | Narendra Narang | |
9 | 1 | Narendra Narang | Interested Parties |
10 | 1 | Narendra Narang | If you are interested in contributing to this blueprint, or want to be a "speaker" during the Summit session, list your name here. |
11 | 1 | Narendra Narang | Name (Affiliation) |
12 | 1 | Narendra Narang | Name (Affiliation) |
13 | 1 | Narendra Narang | Name |
14 | 1 | Narendra Narang | |
15 | 1 | Narendra Narang | Current Status |
16 | 1 | Narendra Narang | Write operations to a cache tier are 3x replicated for durability. To fulfill a read operation not in the cache tier (aka a read-miss operation) is also 3x replicated i.e. the data for a read-miss operation is fetched from the backing tier and 3 copies of it are stored in the caching tier. |
17 | 1 | Narendra Narang | |
18 | 1 | Narendra Narang | Detailed Description |
19 | 1 | Narendra Narang | A "cache" tier would typically be configured as a 3x replicated pool. Writes to a cache tier would follow the same rules for durability and immediate consistency i.e. CRUSH -> primary OSD, secondary OSD, tertiary OSD. All 3 write ops to the respective OSD journals would need to be committed before sending an acknowledgement of commit back to the client. Then at some point, based on the LRU algorithm, the writes would be aged out to a backing (most likely configured as an erasure coded) tier. |
20 | 1 | Narendra Narang | |
21 | 1 | Narendra Narang | However, in the case of reads, there are 2 possibilities: |
22 | 1 | Narendra Narang | * First, a read operation which serves the request for I/O directly from the cache tier. This "cache hit" scenario is ideal because there is no additional operation to locate and read/promote the data from the backing tier |
23 | 1 | Narendra Narang | * Second, and a not so ideal scenario, is a read "cache miss" which isn't able to fulfill the read I/O request from the cache tier. So it now has to fetch and promote the data from the backing tier to the caching tier. Additionally, Ceph first promotes the data from the primary OSD's backing tier to the cache pool tier and then also copies this data, over the network, to make 2 more copies elsewhere in the cache pool. Basically, it's promoting, copying and then storing multiple (3) copies in the cache tier, across the cluster's cache pool before it responds to and fulfills the read I/O request. |
24 | 1 | Narendra Narang | |
25 | 1 | Narendra Narang | The read miss behavior is expensive for the following reasons: |
26 | 1 | Narendra Narang | * It waits to serve the request for read I/O until 3x copies are stored in the cache tier and thereby increases response time |
27 | 1 | Narendra Narang | * It has to copy this "redundant" data over the network and thereby results in traffic overhead |
28 | 1 | Narendra Narang | * It "populates" copies of this data unnecessarily on expensive SSDs and thereby reduces efficiencies (cost/performance) of this fast tier. |
29 | 1 | Narendra Narang | |
30 | 1 | Narendra Narang | For a write, storing 3x copies in the cache tier is desirable for durability. However, the same behavior is not ideal for read (miss) operations, since the read request is directed to the primary OSD anyway. In the event of a failure of either the primary OSD or the primary OSD's node, Ceph could locate and promote the data from the alternate OSDs. |
31 | 1 | Narendra Narang | |
32 | 1 | Narendra Narang | Work items |
33 | 1 | Narendra Narang | This section should contain a list of work tasks created by this blueprint. Please include engineering tasks as well as related build/release and documentation work. If this blueprint requires cleanup of deprecated features, please list those tasks as well. |
34 | 1 | Narendra Narang | |
35 | 1 | Narendra Narang | Coding tasks |
36 | 1 | Narendra Narang | Task 1 |
37 | 1 | Narendra Narang | Task 2 |
38 | 1 | Narendra Narang | Task 3 |
39 | 1 | Narendra Narang | |
40 | 1 | Narendra Narang | Build / release tasks |
41 | 1 | Narendra Narang | Task 1 |
42 | 1 | Narendra Narang | Task 2 |
43 | 1 | Narendra Narang | Task 3 |
44 | 1 | Narendra Narang | |
45 | 1 | Narendra Narang | Documentation tasks |
46 | 1 | Narendra Narang | Task 1 |
47 | 1 | Narendra Narang | Task 2 |
48 | 1 | Narendra Narang | Task 3 |
49 | 1 | Narendra Narang | |
50 | 1 | Narendra Narang | Deprecation tasks |
51 | 1 | Narendra Narang | Task 1 |
52 | 1 | Narendra Narang | Task 2 |
53 | 1 | Narendra Narang | Task 3 |