Fixed memory layout for MessageOp passing » History » Version 2
Jessica Mack, 07/03/2015 06:54 PM
1 | 1 | Jessica Mack | h1. Fixed memory layout for MessageOp passing |
---|---|---|---|
2 | |||
3 | h3. Summary |
||
4 | |||
5 | Now ObjectStore's transaction encode process consume too much time for PG thread which cause 20%-30% latency of the total osd op latency. ObjectStore transaction is a central object which collects ObjectStore's ops corresponding to a client op. |
||
6 | In order to make a tradeoff between flexible ObjectStore's API design and performance purpose, here want to introduce the fixed-position op's structure memory layout which can avoid encode/decode works. |
||
7 | Futhermore, if we have a good impl for ObjectStore transaction which fixed member layout. We can expand it to Message impl which has the same problem. If these applied, six performance pains which are client encode/decode ops, message encode/decode, pg op encode/decode, subop encode/decode can be killed. |
||
8 | |||
9 | h3. Owners |
||
10 | |||
11 | * Haomai Wang (UnitedStack) |
||
12 | * Dong Yuan(UnitedStack) |
||
13 | * Name |
||
14 | |||
15 | h3. Interested Parties |
||
16 | |||
17 | * Name (Affiliation) |
||
18 | * Name (Affiliation) |
||
19 | * Name |
||
20 | |||
21 | h3. Current Status |
||
22 | |||
23 | sage has a blueprint for ObjectStore transaction design(https://wiki.ceph.com/Planning/Blueprints/Hammer/osd%3A_update_Transaction_encoding) and corresponding branch(https://github.com/ceph/ceph/tree/wip-transaction). |
||
24 | |||
25 | h3. Detailed Description |
||
26 | |||
27 | In general, there exists two ways to acchive the goal: |
||
28 | 2 | Jessica Mack | # Implement a new uniform memory layout model which can place differents ObjectStore ops. For example, a simple model see below: |
29 | 1 | Jessica Mack | <pre> |
30 | | op_num | op1 (type, offset and len) | op2 (type, offset and len) | op3 (type, offset and len) | ... | op1 arguments(opaque) | op 2 arguemtns(opque) | op3 arguemnts(opaque) | bufferlist ......| |
||
31 | </pre> |
||
32 | |||
33 | The core of this method I think is weeding out STL or other complexity structures for ObjectStore's API, so op's arguments's should be simple element so memory layout can be fixed. So if we mainly use simple element, Message can refer to this and refactor like this. |
||
34 | |||
35 | 2 | Jessica Mack | # Totally discard ObjectStore's transactin and let successors do it. For example, ObjectStore's write method will directly call FileStore::_write, and FileStore is responsiable for buffer op and decide to how do it. So, we can directly pass existing STL vector and map to FileStore and FileStore can reference it without encode/decode. |
36 | 1 | Jessica Mack | |
37 | This method also need add additional information to subop message which require replicate osd calculate op again. |
||
38 | |||
39 | In short, method 1 need we redesign ObjectStore's API and use simple elements for op's arugments. Method 2's problem is that we need to consider replicated OSD need to calculate again and maybe break some rules. |
||
40 | |||
41 | h3. Work items |
||
42 | |||
43 | h4. Coding tasks |
||
44 | |||
45 | # Task 1 |
||
46 | # Task 2 |
||
47 | # Task 3 |
||
48 | |||
49 | h4. Build / release tasks |
||
50 | |||
51 | # Task 1 |
||
52 | # Task 2 |
||
53 | # Task 3 |
||
54 | |||
55 | h4. Documentation tasks |
||
56 | |||
57 | # Task 1 |
||
58 | # Task 2 |
||
59 | # Task 3 |
||
60 | |||
61 | h4. Deprecation tasks |
||
62 | |||
63 | # Task 1 |
||
64 | # Task 2 |
||
65 | # Task 3 |