Project

General

Profile

CephFS - file creation and object-level backtraces

Summary

CephFS benchmarks well in many scenarios, but file creates are a persistent slow point. This has been exacerbated by the addition of backtraces to RADOS objects. We have ideas on improving them.

Owners

  • Gregory Farnum (Inktank/Red Hat)
  • Name (Affiliation)
  • Name

Interested Parties

  • Name (Affiliation)
  • Name (Affiliation)
  • Name

Current Status

We create files by sending a synchronous request to the MDS. The MDS is responsible for writing out a "backtrace" to the first RADOS object in the file, and does so when expiring the journal segment containing the create.
This causes a few problems:
1) It's slow to do file creates like this.
2) When doing a lot of file creates (ie, for an rsync) it can bunch up disk accesses from the MDS on journal expiration that overwhelm client IO.

We need to discuss a few different ideas around this space:
1) Allowing clients to write backtraces on file creates
2) [ Perhaps incompatible with the prior ] Give clients a preallocated pool of inodes which they can use to independently create files on directories where they hold caps.
3) Allow the MDS to store backtraces in a specific pool instead of the file's data pool.

Detailed Description

When creating a file today, there are a number of steps:
  • The client sends an MClientRequest to the MDS to create the inode and link it in to the tree.
  • The MDS takes an inode off of the preallocated list and links it in to the tree for the client
    • Under some circumstances it might need to journal the inode allocation before linking it in
  • The MDS sends back a reply and asynchronously journals the create
  • The client makes use of the file
    • ?...and eventually closes and drops it
  • When the journal segment is being expired, the MDS writes a backtrace out to the first RADOS object.

There are tradeoffs between ideas (1) and (2). Idea (3) does not provide all the benefits of on-data backtraces. We should discuss these tradeoffs and the relative priorities.

Note the related tickets:
http://tracker.ceph.com/issues/8230
http://tracker.ceph.com/issues/8358

Work items

Coding tasks

  1. Task 1
  2. Task 2
  3. Task 3

Build / release tasks

  1. Task 1
  2. Task 2
  3. Task 3

Documentation tasks

  1. Task 1
  2. Task 2
  3. Task 3

Deprecation tasks

  1. Task 1
  2. Task 2
  3. Task 3