Project

General

Profile

Osd - Faster Peering » History » Version 2

Jessica Mack, 07/10/2015 09:58 PM

1 1 Jessica Mack
h1. Osd - Faster Peering
2 1 Jessica Mack
3 1 Jessica Mack
h3. Summary
4 1 Jessica Mack
5 1 Jessica Mack
For correctness reasons, peering requires a series of serial message transmissions and filestore syncs prior to completion.  This puts something of a lower bound on the latency client IO suffers on cluster change.
6 1 Jessica Mack
7 1 Jessica Mack
h3. Owners
8 1 Jessica Mack
9 1 Jessica Mack
* Sam Just (RedHat)
10 1 Jessica Mack
* Name (Affiliation)
11 1 Jessica Mack
* Name
12 1 Jessica Mack
13 1 Jessica Mack
h3. Interested Parties
14 1 Jessica Mack
15 1 Jessica Mack
* Guang Yang (Yahoo!)
16 1 Jessica Mack
* Name (Affiliation)
17 1 Jessica Mack
* Name
18 1 Jessica Mack
19 1 Jessica Mack
h3. Current Status
20 1 Jessica Mack
 
21 1 Jessica Mack
h3. Detailed Description
22 1 Jessica Mack
23 2 Jessica Mack
!{width:40%}graph.png!
24 1 Jessica Mack
25 1 Jessica Mack
The above is the peering state chart generated from the source.  GetInfo->GetLog->GetMissing requires three round trips to replicas.  First, we get pg infos from every osd in the prior set, acting set, and up set in order to choose an authoritative log.  Second, we fetch the authoritative log.  Last, we fetch missing sets from each acting set replica for use during recovery.
26 1 Jessica Mack
1) Can we preemptively request the log+missing for osds in the most recent prior set interval to hopefully skip the GetLog step?
27 1 Jessica Mack
2) Can we preemptively request the log+missing for acting and up osds in the GetInfo set to hopefully skip the GetMissing step?
28 1 Jessica Mack
 
29 1 Jessica Mack
Another wrinkle is that replicas do not send the info requested in GetInfo and the primary cannot start peering until the previous acting interval has been flushed.
30 1 Jessica Mack
1) We might be able to relax this to waiting for a commit (journal only) if we track unstable objects across intervals.  We need to track unstable objects for replicas going forward anyway to get replica reads right, so this might not be so bad.
31 1 Jessica Mack
32 1 Jessica Mack
h3. Work items
33 1 Jessica Mack
34 1 Jessica Mack
h4. Coding tasks
35 1 Jessica Mack
36 1 Jessica Mack
# Task 1
37 1 Jessica Mack
# Task 2
38 1 Jessica Mack
# Task 3
39 1 Jessica Mack
40 1 Jessica Mack
h4. Build / release tasks
41 1 Jessica Mack
42 1 Jessica Mack
# Task 1
43 1 Jessica Mack
# Task 2
44 1 Jessica Mack
# Task 3
45 1 Jessica Mack
46 1 Jessica Mack
h4. Documentation tasks
47 1 Jessica Mack
48 1 Jessica Mack
# Task 1
49 1 Jessica Mack
# Task 2
50 1 Jessica Mack
# Task 3
51 1 Jessica Mack
52 1 Jessica Mack
h4. Deprecation tasks
53 1 Jessica Mack
54 1 Jessica Mack
# Task 1
55 1 Jessica Mack
# Task 2
56 1 Jessica Mack
# Task 3