Optimize Newstore for massive small object storage » History » Version 1
Xiaoxi Chen, 06/12/2015 06:30 AM
1 | 1 | Xiaoxi Chen | h3. *Optimize Newstore for massive small objects storage* |
---|---|---|---|
2 | |||
3 | *Summary* |
||
4 | There are more and more companies adopting Ceph as their storage solution, ceph is doing extremely well in RBD and large object storage , but as a lot of results from both Intel and other user clearing showing the issue of Ceph in “Lots Of Small File” issue. |
||
5 | In LOSF case, the average object size is as small as 10s to 100s KB, which is usually the size of a compressed image/HTML/Text/Pdf. In the current approach , the objects will live on the FS as individual files, which usually means millions of files in FS. This will over-run the FS and introduce large read/write amplification since every IO need to go through the whole tree. |
||
6 | Newstore introduced fragement_list, which de-coupled the logical object and physical location., and it could use open_by_handler to reduce the cost of tree-traverse. From the first design ,we allow one object to have multiple fragment, now we would like to extend the object->fragment mapping from 1: N to N: M, that means, we want to make multiple object sharing one fragment. |
||
7 | |||
8 | |||
9 | *Owners* |
||
10 | Xiaoxi CHEN (Intel) |
||
11 | |||
12 | *Interested Parties* |
||
13 | |||
14 | Xiaoxi CHEN (Intel) |
||
15 | Jian Zhang (Intel) |
||
16 | |||
17 | *Current Status* |
||
18 | |||
19 | There are existing facilities in newstore, in fragement_t, we already have an offset and lengh to the file. |
||
20 | struct fragment_t { |
||
21 | uint32_t offset; ///< offset in file to first byte of this fragment |
||
22 | uint32_t length; ///< length of fragment/extent |
||
23 | fid_t fid; ///< file backing this fragment |
||
24 | |||
25 | |||
26 | *Detailed Description* |
||
27 | This is the big one! Please provide a detailed description for the proposed change. Where appropriate, include your architectural approach, a list of systems involved, important consequences, and issues that are still unresolved. |
||
28 | |||
29 | *Work items* |
||
30 | This section should contain a list of work tasks created by this blueprint. Please include engineering tasks as well as related build/release and documentation work. If this blueprint requires cleanup of deprecated features, please list those tasks as well. |
||
31 | |||
32 | *Coding tasks* |
||
33 | Task 1 |
||
34 | Task 2 |
||
35 | Task 3 |
||
36 | |||
37 | *Build / release tasks* |
||
38 | Task 1 |
||
39 | Task 2 |
||
40 | Task 3 |
||
41 | |||
42 | *Documentation tasks* |
||
43 | Task 1 |
||
44 | Task 2 |
||
45 | Task 3 |
||
46 | |||
47 | *Deprecation tasks* |
||
48 | Task 1 |
||
49 | Task 2 |
||
50 | Task 3 |