Project

General

Profile

Fixed memory layout for MessageOp passing » History » Version 3

Jessica Mack, 07/03/2015 06:54 PM

1 1 Jessica Mack
h1. Fixed memory layout for MessageOp passing
2
3
h3. Summary
4
5
Now ObjectStore's transaction encode process consume too much time for PG thread which cause 20%-30% latency of the total osd op latency. ObjectStore transaction is a central object which collects ObjectStore's ops corresponding to a client op. 
6
In order to make a tradeoff between flexible ObjectStore's API design and performance purpose, here want to introduce the fixed-position op's structure memory layout which can avoid encode/decode works.
7
Futhermore, if we have a good impl for ObjectStore transaction which fixed member layout. We can expand it to Message impl which has the same problem. If these applied, six performance pains which are client encode/decode ops, message encode/decode, pg op encode/decode, subop encode/decode can be killed.
8
9
h3. Owners
10
11
* Haomai Wang (UnitedStack)
12
* Dong Yuan(UnitedStack)
13
* Name
14
15
h3. Interested Parties
16
17
* Name (Affiliation)
18
* Name (Affiliation)
19
* Name
20
21
h3. Current Status
22
23
sage has a blueprint for ObjectStore transaction design(https://wiki.ceph.com/Planning/Blueprints/Hammer/osd%3A_update_Transaction_encoding) and corresponding branch(https://github.com/ceph/ceph/tree/wip-transaction).
24
25
h3. Detailed Description
26
27
In general, there exists two ways to acchive the goal:
28 3 Jessica Mack
1. Implement a new uniform memory layout model which can place differents ObjectStore ops. For example, a simple model see below:
29 1 Jessica Mack
<pre>
30
| op_num | op1 (type, offset and len) | op2 (type, offset and len) | op3 (type, offset and len) | ... | op1 arguments(opaque) | op 2 arguemtns(opque) | op3 arguemnts(opaque) | bufferlist ......|
31
</pre> 
32
 
33
The core of this method I think is weeding out STL or other complexity structures for ObjectStore's API, so op's arguments's should be simple element so memory layout can be fixed. So if we mainly use simple element, Message can refer to this and refactor like this.
34
 
35 3 Jessica Mack
2. Totally discard ObjectStore's transactin and let successors do it. For example, ObjectStore's write method will directly call FileStore::_write, and FileStore is responsiable for buffer op and decide to how do it. So, we can directly pass existing STL vector and map to FileStore and FileStore can reference it without encode/decode.
36 1 Jessica Mack
 
37
This method also need add additional information to subop message which require replicate osd calculate op again.
38
 
39
In short, method 1 need we redesign ObjectStore's API and use simple elements for op's arugments. Method 2's problem is that we need to consider replicated OSD need to calculate again and maybe break some rules.
40
41
h3. Work items
42
43
h4. Coding tasks
44
45
# Task 1
46
# Task 2
47
# Task 3
48
49
h4. Build / release tasks
50
51
# Task 1
52
# Task 2
53
# Task 3
54
55
h4. Documentation tasks
56
57
# Task 1
58
# Task 2
59
# Task 3
60
61
h4. Deprecation tasks
62
63
# Task 1
64
# Task 2
65
# Task 3