CephFS - file creation and object-level backtraces¶
Summary¶
CephFS benchmarks well in many scenarios, but file creates are a persistent slow point. This has been exacerbated by the addition of backtraces to RADOS objects. We have ideas on improving them.
Owners¶
- Gregory Farnum (Inktank/Red Hat)
- Name (Affiliation)
- Name
Interested Parties¶
- Name (Affiliation)
- Name (Affiliation)
- Name
Current Status¶
We create files by sending a synchronous request to the MDS. The MDS is responsible for writing out a "backtrace" to the first RADOS object in the file, and does so when expiring the journal segment containing the create.
This causes a few problems:
1) It's slow to do file creates like this.
2) When doing a lot of file creates (ie, for an rsync) it can bunch up disk accesses from the MDS on journal expiration that overwhelm client IO.
We need to discuss a few different ideas around this space:
1) Allowing clients to write backtraces on file creates
2) [ Perhaps incompatible with the prior ] Give clients a preallocated pool of inodes which they can use to independently create files on directories where they hold caps.
3) Allow the MDS to store backtraces in a specific pool instead of the file's data pool.
Detailed Description¶
When creating a file today, there are a number of steps:- The client sends an MClientRequest to the MDS to create the inode and link it in to the tree.
- The MDS takes an inode off of the preallocated list and links it in to the tree for the client
- Under some circumstances it might need to journal the inode allocation before linking it in
- The MDS sends back a reply and asynchronously journals the create
- The client makes use of the file
- ?...and eventually closes and drops it
- When the journal segment is being expired, the MDS writes a backtrace out to the first RADOS object.
There are tradeoffs between ideas (1) and (2). Idea (3) does not provide all the benefits of on-data backtraces. We should discuss these tradeoffs and the relative priorities.
Note the related tickets:
http://tracker.ceph.com/issues/8230
http://tracker.ceph.com/issues/8358
Work items¶
Coding tasks¶
- Task 1
- Task 2
- Task 3
Build / release tasks¶
- Task 1
- Task 2
- Task 3
Documentation tasks¶
- Task 1
- Task 2
- Task 3
Deprecation tasks¶
- Task 1
- Task 2
- Task 3