Project

General

Profile

Cache Tiering - Improve efficiency of read-miss operations » History » Version 1

Narendra Narang, 06/11/2015 01:42 AM

1 1 Narendra Narang
h1. Cache Tiering - Improve efficiency of read-miss operations
2 1 Narendra Narang
3 1 Narendra Narang
Summary
4 1 Narendra Narang
Suggested changes to the way read-misses are fulfilled from the cache tier to improve efficiencies.
5 1 Narendra Narang
6 1 Narendra Narang
Owners
7 1 Narendra Narang
Narendra Narang (Red Hat)
8 1 Narendra Narang
9 1 Narendra Narang
Interested Parties
10 1 Narendra Narang
If you are interested in contributing to this blueprint, or want to be a "speaker" during the Summit session, list your name here.
11 1 Narendra Narang
Name (Affiliation)
12 1 Narendra Narang
Name (Affiliation)
13 1 Narendra Narang
Name
14 1 Narendra Narang
15 1 Narendra Narang
Current Status
16 1 Narendra Narang
Write operations to a cache tier are 3x replicated for durability. To fulfill a read operation not in the cache tier (aka a read-miss operation) is also 3x replicated i.e. the data for a read-miss operation is fetched from the backing tier and 3 copies of it are stored in the caching tier.
17 1 Narendra Narang
18 1 Narendra Narang
Detailed Description
19 1 Narendra Narang
A "cache" tier would typically be configured as a 3x replicated pool. Writes to a cache tier would follow the same rules for durability and immediate consistency i.e. CRUSH -> primary OSD, secondary OSD, tertiary OSD. All 3 write ops to the respective OSD journals would need to be committed before sending an acknowledgement of commit back to the client. Then at some point, based on the LRU algorithm, the writes would be aged out to a backing (most likely configured as an erasure coded) tier.
20 1 Narendra Narang
21 1 Narendra Narang
However, in the case of reads, there are 2 possibilities:
22 1 Narendra Narang
* First, a read operation which serves the request for I/O directly from the cache tier. This "cache hit" scenario is ideal because there is no additional operation to locate and read/promote the data from the backing tier
23 1 Narendra Narang
* Second, and a not so ideal scenario, is a read "cache miss" which isn't able to fulfill the read I/O request from the cache tier. So it now has to fetch and promote the data from the backing tier to the caching tier. Additionally, Ceph first promotes the data from the primary OSD's backing tier to the cache pool tier and then also copies this data, over the network, to make 2 more copies elsewhere in the cache pool. Basically, it's promoting, copying and then storing multiple (3) copies in the cache tier, across the cluster's cache pool before it responds to and fulfills the read I/O request.
24 1 Narendra Narang
25 1 Narendra Narang
The read miss behavior is expensive for the following reasons:
26 1 Narendra Narang
* It waits to serve the request for read I/O until 3x copies are stored in the cache tier and thereby increases response time
27 1 Narendra Narang
* It has to copy this "redundant" data over the network and thereby results in traffic overhead
28 1 Narendra Narang
* It "populates" copies of this data unnecessarily on expensive SSDs and thereby reduces efficiencies (cost/performance) of this fast tier.
29 1 Narendra Narang
30 1 Narendra Narang
For a write, storing 3x copies in the cache tier is desirable for durability. However, the same behavior is not ideal for read (miss) operations, since the read request is directed to the primary OSD anyway. In the event of a failure of either the primary OSD or the primary OSD's node, Ceph could locate and promote the data from the alternate OSDs.
31 1 Narendra Narang
32 1 Narendra Narang
Work items
33 1 Narendra Narang
This section should contain a list of work tasks created by this blueprint. Please include engineering tasks as well as related build/release and documentation work. If this blueprint requires cleanup of deprecated features, please list those tasks as well.
34 1 Narendra Narang
35 1 Narendra Narang
Coding tasks
36 1 Narendra Narang
Task 1
37 1 Narendra Narang
Task 2
38 1 Narendra Narang
Task 3
39 1 Narendra Narang
40 1 Narendra Narang
Build / release tasks
41 1 Narendra Narang
Task 1
42 1 Narendra Narang
Task 2
43 1 Narendra Narang
Task 3
44 1 Narendra Narang
45 1 Narendra Narang
Documentation tasks
46 1 Narendra Narang
Task 1
47 1 Narendra Narang
Task 2
48 1 Narendra Narang
Task 3
49 1 Narendra Narang
50 1 Narendra Narang
Deprecation tasks
51 1 Narendra Narang
Task 1
52 1 Narendra Narang
Task 2
53 1 Narendra Narang
Task 3