Project

General

Profile

Feature #39129

create mechanism to delegate ranges of inode numbers to client

Added by Jeff Layton 2 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

Create a mechanism by which we can hand out ranges of inode numbers to MDS clients. The clients can then use those to fully instantiate inodes in memory and then flush them back to the server.

We already allocate a range of inode numbers for each client in the prealloc_inos interval set inside the MDS. What may be easiest is to just hand out smaller pieces of that range to the client via some new mechanism.

I'm not sure if we need new messages for this, or whether we could extend some existing messages to contain the set. We probably would want these in MClientReply (so we could replenish the client when we are responding to a create). Maybe we could update the client via other mechanisms too? I'm not sure what would work best here yet.

For now, the client can just ignore these ranges.


Related issues

Related to fs - Feature #24461: cephfs: improve file create performance buffering file create operations New 06/08/2018
Related to fs - Feature #38951: implement buffered unlink in libcephfs New

History

#1 Updated by Jeff Layton 2 months ago

  • Related to Feature #24461: cephfs: improve file create performance buffering file create operations added

#2 Updated by Patrick Donnelly 2 months ago

  • Assignee set to Jeff Layton
  • Target version set to v15.0.0
  • Start date deleted (04/05/2019)

#3 Updated by Jeff Layton about 2 months ago

  • Related to Feature #38951: implement buffered unlink in libcephfs added

#4 Updated by Jeff Layton about 1 month ago

We may not need this after all. The kernel client at least doesn't care a lot about the inode number. We can do pretty much anything we want with the inode in memory, and leave inode->i_ino it set to 0 initially. When we get the CREATE reply, we can then fill out the inode number.

This does mean that we'll have to wait on the CREATE reply in order to do a stat(), or a statx() with STATX_INO, but that's probably fine. We'll also need to wait on that reply before we can flush dirty inode data to the OSDs, as we need to know the inode number in order to write to the objects. That said, we should be fine to write to the pagecache until that point.

#5 Updated by Patrick Donnelly about 1 month ago

Jeff Layton wrote:

We may not need this after all. The kernel client at least doesn't care a lot about the inode number. We can do pretty much anything we want with the inode in memory, and leave inode->i_ino it set to 0 initially. When we get the CREATE reply, we can then fill out the inode number.

This does mean that we'll have to wait on the CREATE reply in order to do a stat(), or a statx() with STATX_INO, but that's probably fine. We'll also need to wait on that reply before we can flush dirty inode data to the OSDs, as we need to know the inode number in order to write to the objects. That said, we should be fine to write to the pagecache until that point.

That may actually be a better approach so that the MDS doesn't need to cleanup after us if the client fails.

Also available in: Atom PDF