Project

General

Profile

Actions

Feature #44274

open

mds: disconnect file data from inode number

Added by Patrick Donnelly about 4 years ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
Client, MDS, kceph
Labels (FS):
Pull request ID:

Description

Currently CephFS uses the inode number to construct the object names for the file data. This has generally worked well but there are some workloads where it slows down the file system. For example, open(..., O_TRUNC) is slower because the MDS needs to truncate the file before returning to the client. A more efficient strategy would be to allocate a new "file data number", immediately give that number to the client so it can begin writes, and out-of-band truncate/purge the old file data.

Another benefit to doing this would allow atomic layout transformations as described in #40285. The MDS could allocate a new file data number during the transformation so that the final switch to the new layout is atomic.

Yet another benefit of this approach is to lay the groundwork for #1680. In order to allow cloning a file (which creates a new inode) with copy-on-write data blocks, we'll first need to make the file data blocks disconnected from the inode number.

Note on disaster recovery: the default data pool has an object for each inode (which may also be the first file data object, depending on layout). If we use this new "file data number" for the object names, we'll have to have an xattr with the inode number to allow disaster recovery. There will also need to be a way to disambiguate two file data numbers referencing the same inode.

Note: this change will break older clients. There needs to be a CephFS feature flag gating the new feature.


Related issues 2 (2 open0 closed)

Related to CephFS - Feature #40285: mds: support hierarchical layout transformations on filesNew

Actions
Related to CephFS - Feature #1680: support reflink (cheap file copy/clone)New

Actions
Actions #1

Updated by Patrick Donnelly about 4 years ago

  • Related to Feature #40285: mds: support hierarchical layout transformations on files added
Actions #2

Updated by Patrick Donnelly about 4 years ago

  • Related to Feature #1680: support reflink (cheap file copy/clone) added
Actions #3

Updated by Milind Changire over 2 years ago

  • Assignee set to Milind Changire
Actions #4

Updated by Greg Farnum about 1 year ago

  • Priority changed from High to Normal

@Patrick do you think this is something we still need to carry on its own, in light of https://tracker.ceph.com/issues/54205?

I'm not totally sure why allocating a new file data number here would be faster than our existing truncate_seq stuff, which is just in-memory twiddling.

Actions

Also available in: Atom PDF