Optimize Newstore for massive small object storage » History » Version 2
Xiaoxi Chen, 06/12/2015 06:30 AM
1 | 1 | Xiaoxi Chen | h3. *Optimize Newstore for massive small objects storage* |
---|---|---|---|
2 | |||
3 | *Summary* |
||
4 | There are more and more companies adopting Ceph as their storage solution, ceph is doing extremely well in RBD and large object storage , but as a lot of results from both Intel and other user clearing showing the issue of Ceph in “Lots Of Small File” issue. |
||
5 | 2 | Xiaoxi Chen | |
6 | 1 | Xiaoxi Chen | In LOSF case, the average object size is as small as 10s to 100s KB, which is usually the size of a compressed image/HTML/Text/Pdf. In the current approach , the objects will live on the FS as individual files, which usually means millions of files in FS. This will over-run the FS and introduce large read/write amplification since every IO need to go through the whole tree. |
7 | 2 | Xiaoxi Chen | |
8 | 1 | Xiaoxi Chen | Newstore introduced fragement_list, which de-coupled the logical object and physical location., and it could use open_by_handler to reduce the cost of tree-traverse. From the first design ,we allow one object to have multiple fragment, now we would like to extend the object->fragment mapping from 1: N to N: M, that means, we want to make multiple object sharing one fragment. |
9 | |||
10 | |||
11 | *Owners* |
||
12 | Xiaoxi CHEN (Intel) |
||
13 | |||
14 | *Interested Parties* |
||
15 | |||
16 | Xiaoxi CHEN (Intel) |
||
17 | Jian Zhang (Intel) |
||
18 | |||
19 | *Current Status* |
||
20 | |||
21 | There are existing facilities in newstore, in fragement_t, we already have an offset and lengh to the file. |
||
22 | struct fragment_t { |
||
23 | uint32_t offset; ///< offset in file to first byte of this fragment |
||
24 | uint32_t length; ///< length of fragment/extent |
||
25 | fid_t fid; ///< file backing this fragment |
||
26 | |||
27 | |||
28 | *Detailed Description* |
||
29 | This is the big one! Please provide a detailed description for the proposed change. Where appropriate, include your architectural approach, a list of systems involved, important consequences, and issues that are still unresolved. |
||
30 | |||
31 | *Work items* |
||
32 | This section should contain a list of work tasks created by this blueprint. Please include engineering tasks as well as related build/release and documentation work. If this blueprint requires cleanup of deprecated features, please list those tasks as well. |
||
33 | |||
34 | *Coding tasks* |
||
35 | Task 1 |
||
36 | Task 2 |
||
37 | Task 3 |
||
38 | |||
39 | *Build / release tasks* |
||
40 | Task 1 |
||
41 | Task 2 |
||
42 | Task 3 |
||
43 | |||
44 | *Documentation tasks* |
||
45 | Task 1 |
||
46 | Task 2 |
||
47 | Task 3 |
||
48 | |||
49 | *Deprecation tasks* |
||
50 | Task 1 |
||
51 | Task 2 |
||
52 | Task 3 |