Project

General

Profile

Optimize Newstore for massive small object storage » History » Version 2

Xiaoxi Chen, 06/12/2015 06:30 AM

1 1 Xiaoxi Chen
h3. *Optimize Newstore for massive small objects storage*
2
3
*Summary*
4
There are more and more companies adopting Ceph as their storage solution,   ceph is doing extremely well in RBD and large object storage , but as a lot of results from both Intel and other user clearing showing the issue of Ceph in “Lots Of Small File” issue.
5 2 Xiaoxi Chen
6 1 Xiaoxi Chen
In LOSF case, the average object size is as small as 10s to 100s KB, which is usually the size of a compressed image/HTML/Text/Pdf.  In the current approach , the objects will live on the FS as individual files,  which usually means millions of files in FS.  This will over-run the FS and introduce large read/write amplification since every IO need to go through the whole tree.
7 2 Xiaoxi Chen
8 1 Xiaoxi Chen
Newstore introduced fragement_list, which de-coupled the logical object and physical location., and it could use open_by_handler to reduce the cost of tree-traverse. From the first design ,we allow one object to have multiple fragment, now we would like to extend the object->fragment mapping from 1: N to N: M, that means, we want to make multiple object sharing one fragment.
9
10
11
*Owners*
12
Xiaoxi CHEN (Intel)
13
14
*Interested Parties*
15
16
Xiaoxi CHEN (Intel)
17
Jian Zhang (Intel)
18
19
*Current Status*
20
21
There are existing facilities in newstore,  in fragement_t, we already have an offset and lengh to the file.
22
struct fragment_t {
23
  uint32_t offset;   ///< offset in file to first byte of this fragment
24
  uint32_t length;   ///< length of fragment/extent
25
  fid_t fid;         ///< file backing this fragment
26
27
28
*Detailed Description*
29
This is the big one!  Please provide a detailed description for the proposed change.  Where appropriate, include your architectural approach, a list of systems involved, important consequences, and issues that are still unresolved.
30
31
*Work items*
32
This section should contain a list of work tasks created by this blueprint.  Please include engineering tasks as well as related build/release and documentation work.  If this blueprint requires cleanup of deprecated features, please list those tasks as well.
33
34
*Coding tasks*
35
Task 1
36
Task 2
37
Task 3
38
39
*Build / release tasks*
40
Task 1
41
Task 2
42
Task 3
43
44
*Documentation tasks*
45
Task 1
46
Task 2
47
Task 3
48
49
*Deprecation tasks*
50
Task 1
51
Task 2
52
Task 3