Project

General

Profile

Mds - Inline data support (Step 2) » History » Version 1

Jessica Mack, 06/21/2015 11:36 PM

1 1 Jessica Mack
h1. Mds - Inline data support (Step 2)
2
3
h3. Summary
4
5
We have worked out a preliminary implementation for inline data support, and observed obvious speed up for small file access. To our knowledge, the algorithm is without correctness issue even under the situation of share write or read/write.
6
The step 2 will focus on trying to make things simpler to eliminate the state of a file half-inlined, and solve the protocal backword compatability.
7
8
h3. Owners
9
10
* Li Wang (UbuntuKylin)
11
12
h3. Interested Parties
13
14
* Sage Weil (Inktank)
15
* Name (Affiliation)
16
* Name
17
18
h3. Current Status
19
20
With preliminary implementation. Need improvement.
21
22
h3. Detailed Description
23
24
Our initial implementation is described below. To get things simple, the idea is to use MDS as the read cache of inline data, instead of storing inline data into the inode.
25
26
The MDS would read the inline data from OSD while it is fetching the metadata from OSD.
27
28
<pre>
29
CDir::_fetched {
30
  foreach(CInode inode in InodeList) {
31
    if (inode.size < INLINE_MAX_SIZE)
32
      objecter->read(inode)
33
  }
34
}
35
</pre>
36
37
And the inline data will send to client on the CAP_OP_GRANT messages when client doing getattr() call.
38
39
<pre>
40
CInode::encode_cap_message {
41
  if (!have_send) {
42
    encode(inline data, message);
43
    have_send = true;
44
  }
45
}
46
</pre>
47
48
Client read/write routine would be almost remain unchanged. For the writers, they write inline data into OSDs as usual, the only difference lies in that the first writer will notice the MDS to abandon inline data cache by sending CAP_OP_UPDATE when the range of written fit into [0 .. INLINE_MAX_SIZE].
49
50
The MDS would fetch the inline data again when the last writer exited.
51
 
52
 
53
For our current implementation, we maintain a state machine with three states:
54
  (1) INLINE: The file is small enough to be logically stored at MDS;
55
  (2) MIGRATION: The file size is larger than the inline threshold INLINE_size, however, the data within [0, INLINE_SIZE] is still stored at MDS;
56
  (3) DISABLED: The file is no longer inline, which implies the inline data has been transferred into OSD.
57
 
58
The states migration is unrecoverable.
59
Client to confirm itself owns the up-to-date & complete inline data by checking if it owns the cap Fc.
60
Each time inline data is revised by client, client will dirty the CAP Fb to notify MDS.
61
For each file, a version number ‘inline_version’ is stored inside the metadata to indicate the inline state. It is used/shared between MDS and clients. Besides, for each client and each file, MDS maintain a pair of two version numbers: ‘client_inline_version’ and ‘server_inline_version’. The former records the latest inline data version sent to client through CAP_OP_GRANT/CAP_OP_REVOKE etc. The version number ‘local_version’ maintained by client should be equal to the corresponding ‘client_inline_version’. ‘Server_inline_version’ is used to record the new version after client pushing new inline data to MDS by CAP_OP_UPDATE. If client did not ever call CAP_OP_UPDATE, it should be equal to ‘client_inline_version’. The reason that MDS uses two variables is because after MDS receiving the updated inline data from client, it does not have a chance to tell client the new inline data version.  
62
 
63
Client: 
64
65
p(. (1) Read operation
66
      if (have cap Fc)
67
     
68
p((. copy inline data into page cache/user buffer
69
70
p(.    else
71
      
72
p(((. call getattr() to retrieve inline data from MDS
73
74
p(((.    Write operation
75
         if (have Fc)
76
          merge the written data locally by virtue of page cache
77
          according to the file state, commit the data to MDS/OSD
78
79
else
80
81
p(. submit the written data directly to MDS, the latter will do the merging job
82
 
83
 
84
 
85
h3. Work items
86
87
h4. Coding tasks
88
89
# Task 1
90
# Task 2
91
# Task 3
92
93
h4. Build / release tasks
94
95
# Task 1
96
# Task 2
97
# Task 3
98
99
h4. Documentation tasks
100
101
# Task 1
102
# Task 2
103
# Task 3
104
105
h4. Deprecation tasks
106
107
# Task 1
108
# Task 2
109
# Task 3