Project

General

Profile

CephFS - file creation and object-level backtraces » History » Version 1

Jessica Mack, 06/09/2015 07:08 PM

1 1 Jessica Mack
h1. CephFS - file creation and object-level backtraces
2
3
h3. Summary
4
5
CephFS benchmarks well in many scenarios, but file creates are a persistent slow point. This has been exacerbated by the addition of backtraces to RADOS objects. We have ideas on improving them.
6
7
h3. Owners
8
9
* Gregory Farnum (Inktank/Red Hat)
10
* Name (Affiliation)
11
* Name
12
13
h3. Interested Parties
14
15
* Name (Affiliation)
16
* Name (Affiliation)
17
* Name
18
19
h3. Current Status
20
21
We create files by sending a synchronous request to the MDS. The MDS is responsible for writing out a "backtrace" to the first RADOS object in the file, and does so when expiring the journal segment containing the create.
22
This causes a few problems:
23
1) It's slow to do file creates like this.
24
2) When doing a lot of file creates (ie, for an rsync) it can bunch up disk accesses from the MDS on journal expiration that overwhelm client IO.
25
 
26
We need to discuss a few different ideas around this space:
27
1) Allowing clients to write backtraces on file creates
28
2) [ Perhaps incompatible with the prior ] Give clients a preallocated pool of inodes which they can use to independently create files on directories where they hold caps.
29
3) Allow the MDS to store backtraces in a specific pool instead of the file's data pool.
30
31
h3. Detailed Description
32
33
When creating a file today, there are a number of steps:
34
* The client sends an MClientRequest to the MDS to create the inode and link it in to the tree.
35
* The MDS takes an inode off of the preallocated list and links it in to the tree for the client
36
** Under some circumstances it might need to journal the inode allocation before linking it in
37
* The MDS sends back a reply and asynchronously journals the create
38
* The client makes use of the file
39
** ​...and eventually closes and drops it
40
* When the journal segment is being expired, the MDS writes a backtrace out to the first RADOS object.
41
 
42
There are tradeoffs between ideas (1) and (2). Idea (3) does not provide all the benefits of on-data backtraces. We should discuss these tradeoffs and the relative priorities.
43
 
44
Note the related tickets:
45
http://tracker.ceph.com/issues/8230
46
http://tracker.ceph.com/issues/8358
47
48
h3. Work items
49
50
h3. Coding tasks
51
52
# Task 1
53
# Task 2
54
# Task 3
55
56
h3. Build / release tasks
57
58
# Task 1
59
# Task 2
60
# Task 3
61
62
h3. Documentation tasks
63
64
# Task 1
65
# Task 2
66
# Task 3
67
68
h3. Deprecation tasks
69
70
# Task 1
71
# Task 2
72
# Task 3