Project

General

Profile

Inline data support for Ceph » History » Version 1

Jessica Mack, 06/09/2015 06:47 AM

1 1 Jessica Mack
h1. Inline data support for Ceph
2
3
h3. Summary
4
5
Inline data is a good feature for accelerating small file access, which is present in mainstream local file systems, for example, ext4, btrfs etc. It should be beneficial to let Ceph implement this optimization, since it could save the client the calculation of object location and communication with the OSDs. It hopefully will receive a good IO speedup for small files traffic. 
6
 
7
h3. Owners
8
9
* Li Wang (UbuntuKylin)
10
11
h3. Interested Parties
12
13
* Greg Farnum
14
* Sage Weil
15
* Loic Dachary
16
17
h3. Current Status
18
19
Under design
20
21
h3. Detailed Description
22
23
For a typical Ceph file access traffic, client first asks mds for metadata, then communicates with osd for file data.
24
If a file is very small, its data can be stored together with the metadata, as an extended attribute. While opening a small
25
file, osd will receive file metadata as well as data from mds, the calculation of object location as well as communication with osd are saved.
26
The INLINEDATA will be a mount option to be allowd to turned on.  
27
28
h3. Algorithm
29
30
The key idea befind is to maintain a state machine with three states,
31
32
p((.   INLINED indicate the first page of a file is stored in MDS;
33
   NOTINLINING indicate the file intend to not be inlined, the first page remains UPTODATE on MDS;
34
   NOTINLINED indicate the first page is stored in OSD
35
36
To avoid the frequent write to introduct extra IO overhead for MDS, the write frequency of inlined files are recorded by MDS,
37
if it exceeds the threhold, MDS will transfer the file status to NOTINLINING to force client to write to OSD. 
38
39
h3. 1 Client side
40
41
h4. 1.1 write_page()
42
43
          if (page->index == 0 && inode->status == INLINED) {
44
                err = write_page_to_mds();
45
                if (err == ESTATUS) // status has changed to NOTINLINING or NOTINLINED
46
                    write_page_to_osd();
47
                     return;
48
          }
49
          write_page_to_osd();
50
51
h4. 1.2 ceph_write_end()
52
53
          if (inode->status == INLINED) {
54
                 if (write_pos > PAGE_SIZE) {
55
                    inode->status = NOTINLINING;
56
                    mark_inode_dirty(); // ansynchoronously tell mds to change status to NOTINLINING
57
                 } 
58
                 if (the interval [write_pos, write_pos + write_len] overlap with the interval [0, PAGE_SIZE]) {
59
                     inode->status = NOTINLINED;
60
                     mark_inode_dirty();
61
                 }
62
          }
63
          if (inode->status == NOTINLINING) {
64
                 if (the interval [write_pos, write_pos + write_len] overlap with the interval [0, PAGE_SIZE]) {
65
                     inode->status = NOTINLINED;
66
                     mark_inode_dirty();
67
                 }
68
          }
69
70
h4. 1.3  read_page()
71
72
          if (page->index == 0 && (inode->status == INLINED || inode->status == NOTINLING)) {
73
                   err = read_page_from_mds();
74
                   if (err == ESTATUS) // status has changed to NOTINLINED
75
                          read_page_from_osd();
76
                   return;
77
           }
78
           read_page_from_osd();
79
80
h3. 2 MDS side
81
82
<pre>
83
if (inode->status == INLINED && write_frequency_of_page_zero > THREHOLD)
84
      inode->status = NOTINLINING;
85
if (received_write_page_zero_request_from_client() && inode->status == NOTINLINING) {
86
         err = ESTATUS;
87
         send_response_to_client(err);
88
         inode_status = NOTINLINED;
89
}
90
</pre>
91
92
h3. Work items