CephFS - file creation and object-level backtraces » History » Version 1
Jessica Mack, 06/09/2015 07:08 PM
1 | 1 | Jessica Mack | h1. CephFS - file creation and object-level backtraces |
---|---|---|---|
2 | |||
3 | h3. Summary |
||
4 | |||
5 | CephFS benchmarks well in many scenarios, but file creates are a persistent slow point. This has been exacerbated by the addition of backtraces to RADOS objects. We have ideas on improving them. |
||
6 | |||
7 | h3. Owners |
||
8 | |||
9 | * Gregory Farnum (Inktank/Red Hat) |
||
10 | * Name (Affiliation) |
||
11 | * Name |
||
12 | |||
13 | h3. Interested Parties |
||
14 | |||
15 | * Name (Affiliation) |
||
16 | * Name (Affiliation) |
||
17 | * Name |
||
18 | |||
19 | h3. Current Status |
||
20 | |||
21 | We create files by sending a synchronous request to the MDS. The MDS is responsible for writing out a "backtrace" to the first RADOS object in the file, and does so when expiring the journal segment containing the create. |
||
22 | This causes a few problems: |
||
23 | 1) It's slow to do file creates like this. |
||
24 | 2) When doing a lot of file creates (ie, for an rsync) it can bunch up disk accesses from the MDS on journal expiration that overwhelm client IO. |
||
25 | |||
26 | We need to discuss a few different ideas around this space: |
||
27 | 1) Allowing clients to write backtraces on file creates |
||
28 | 2) [ Perhaps incompatible with the prior ] Give clients a preallocated pool of inodes which they can use to independently create files on directories where they hold caps. |
||
29 | 3) Allow the MDS to store backtraces in a specific pool instead of the file's data pool. |
||
30 | |||
31 | h3. Detailed Description |
||
32 | |||
33 | When creating a file today, there are a number of steps: |
||
34 | * The client sends an MClientRequest to the MDS to create the inode and link it in to the tree. |
||
35 | * The MDS takes an inode off of the preallocated list and links it in to the tree for the client |
||
36 | ** Under some circumstances it might need to journal the inode allocation before linking it in |
||
37 | * The MDS sends back a reply and asynchronously journals the create |
||
38 | * The client makes use of the file |
||
39 | ** ...and eventually closes and drops it |
||
40 | * When the journal segment is being expired, the MDS writes a backtrace out to the first RADOS object. |
||
41 | |||
42 | There are tradeoffs between ideas (1) and (2). Idea (3) does not provide all the benefits of on-data backtraces. We should discuss these tradeoffs and the relative priorities. |
||
43 | |||
44 | Note the related tickets: |
||
45 | http://tracker.ceph.com/issues/8230 |
||
46 | http://tracker.ceph.com/issues/8358 |
||
47 | |||
48 | h3. Work items |
||
49 | |||
50 | h3. Coding tasks |
||
51 | |||
52 | # Task 1 |
||
53 | # Task 2 |
||
54 | # Task 3 |
||
55 | |||
56 | h3. Build / release tasks |
||
57 | |||
58 | # Task 1 |
||
59 | # Task 2 |
||
60 | # Task 3 |
||
61 | |||
62 | h3. Documentation tasks |
||
63 | |||
64 | # Task 1 |
||
65 | # Task 2 |
||
66 | # Task 3 |
||
67 | |||
68 | h3. Deprecation tasks |
||
69 | |||
70 | # Task 1 |
||
71 | # Task 2 |
||
72 | # Task 3 |