Project

General

Profile

Optimize Newstore for massive small object storage » History » Version 3

Guang Yang, 06/19/2015 04:53 AM

1 1 Xiaoxi Chen
h3. *Optimize Newstore for massive small objects storage*
2
3
*Summary*
4
There are more and more companies adopting Ceph as their storage solution,   ceph is doing extremely well in RBD and large object storage , but as a lot of results from both Intel and other user clearing showing the issue of Ceph in “Lots Of Small File” issue.
5 2 Xiaoxi Chen
6 1 Xiaoxi Chen
In LOSF case, the average object size is as small as 10s to 100s KB, which is usually the size of a compressed image/HTML/Text/Pdf.  In the current approach , the objects will live on the FS as individual files,  which usually means millions of files in FS.  This will over-run the FS and introduce large read/write amplification since every IO need to go through the whole tree.
7 2 Xiaoxi Chen
8 1 Xiaoxi Chen
Newstore introduced fragement_list, which de-coupled the logical object and physical location., and it could use open_by_handler to reduce the cost of tree-traverse. From the first design ,we allow one object to have multiple fragment, now we would like to extend the object->fragment mapping from 1: N to N: M, that means, we want to make multiple object sharing one fragment.
9
10
11
*Owners*
12
Xiaoxi CHEN (Intel)
13
14
*Interested Parties*
15
16
Xiaoxi CHEN (Intel)
17
Jian Zhang (Intel)
18 3 Guang Yang
Guang Yang (Yahoo!)
19 1 Xiaoxi Chen
20
*Current Status*
21
22
There are existing facilities in newstore,  in fragement_t, we already have an offset and lengh to the file.
23
struct fragment_t {
24
  uint32_t offset;   ///< offset in file to first byte of this fragment
25
  uint32_t length;   ///< length of fragment/extent
26
  fid_t fid;         ///< file backing this fragment
27
28
29
*Detailed Description*
30
This is the big one!  Please provide a detailed description for the proposed change.  Where appropriate, include your architectural approach, a list of systems involved, important consequences, and issues that are still unresolved.
31
32
*Work items*
33
This section should contain a list of work tasks created by this blueprint.  Please include engineering tasks as well as related build/release and documentation work.  If this blueprint requires cleanup of deprecated features, please list those tasks as well.
34
35
*Coding tasks*
36
Task 1
37
Task 2
38
Task 3
39
40
*Build / release tasks*
41
Task 1
42
Task 2
43
Task 3
44
45
*Documentation tasks*
46
Task 1
47
Task 2
48
Task 3
49
50
*Deprecation tasks*
51
Task 1
52
Task 2
53
Task 3