Project

General

Profile

Actions

Feature #16419

closed

add statx-like interface to libcephfs

Added by Jeff Layton almost 8 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

samba, in particular, can make use of the birthtime for an inode. Have ceph track the btime in the inode and provide interfaces to both retrieve that value and set it.

Actions #1

Updated by Jeff Layton almost 8 years ago

What I'm thinking is that we should add something along the lines of what David Howells has proposed for the new statx syscall, and we'll just add the btime as the only extra field to it for now. Eventually, we'll also need a mechanism to set the btime, but that can be done as stage2. Probably worth talking to the samba folks and seeing what they'd want the setattr interface to look like...

Actions #2

Updated by Sage Weil almost 8 years ago

Jeff Layton wrote:

What I'm thinking is that we should add something along the lines of what David Howells has proposed for the new statx syscall, and we'll just add the btime as the only extra field to it for now. Eventually, we'll also need a mechanism to set the btime, but that can be done as stage2. Probably worth talking to the samba folks and seeing what they'd want the setattr interface to look like...

If you mean a new libcephfs call like ceph_statx() or whatever, then yes, that sounds right. Should be a pretty simple patch to libcephfs to wire that up.

And can you really set btime? :/

Actions #3

Updated by Jeff Layton almost 8 years ago

Yeah, that's what I mean. We have ceph_ll_getattr now (afaict), so we need something like a ceph_ll_getattrx (that name is negotiable of course).

...and yes, you can set the btime. It's important when restoring backups, for instance.

Actions #4

Updated by Jeff Layton almost 8 years ago

  • Subject changed from allow cephfs to provide btime for inodes to add statx-like interface to libcephfs
Actions #5

Updated by Jeff Layton almost 8 years ago

Changing the description since this has ballooned a bit in scope. We want to add btime support and a change_attribute. Also ensure that we can avoid issuing MDS calls when we have the necessary caps to serve them from the cache.

Actions #6

Updated by Jeff Layton almost 8 years ago

Aside from the stuff Greg noticed in his latest review pass, I noticed a number of flaws in the original patchset and am reworking it to fix those. I'm also planning to go ahead and add change_attr support as well.

Also, while you can set the btime, it should be a very rare event, so I expect we don't really need to plumb btime caching support into the caps subsystem. We should be able to just expect the client to do a synchronous setattr with CEPH_SETATTR_BTIME when that occurs, no?

Actions #7

Updated by Greg Farnum almost 8 years ago

We need to be able to serve an accurate btime. I suppose we could break our rules and assume it won't get changed in practice, or only serve it via sync MDS requests, but it should be pretty easy to plumb it in alongside mtime et al and that seems like the right way to go about it.

Actions #8

Updated by Jeff Layton almost 8 years ago

Possibly. The thing is that the btime should only ever change due to an deliberate setattr call. It's unlike the other times in that regard. So I don't see why we couldn't serve it out as we do any other parameter that we don't expect to change very often.

I'm still trying to sort out how the caps system works with this sort of thing, but as long as we ensure that a btime change is never applied while there are any BUFFER or EXCL privs outstanding, then I don't see why we'd necessarily need to plumb this through the caps calls.

That said, maybe doing that would still make it simpler to ensure correctness?

Actions #9

Updated by Jeff Layton over 7 years ago

I have a prototype set for this, but I now think that the handling of the change_attr is wrong for directories. I'm going back and reworking that bit.

Actions #10

Updated by Jeff Layton over 7 years ago

Mostly working now, but I'm seeing occasional problems with truncating files. I bisected the problem down to a one line change to encode_cap_message:

m->change_attr = i->change_attr;

...sometimes the i->change_attr is set to the size to which the file is being truncated! I assume this means that I messed up something in the encode/decode, but I haven't found it yet.

Actions #11

Updated by Jeff Layton over 7 years ago

Found it. I had transposed the size and change_attr args in one call to update_inode_file_bits. fsx now seems to be OK -- gitbuilder is building new branch now and I'll rerun the fs suite against it. After that I need to expand the scope of the change_attr tests to make sure we increment them properly everywhere. Also need to do some general cleanup of the patchset.

Actions #12

Updated by Jeff Layton over 7 years ago

Test run mostly passed last night, with only failures for unrelated problems -- the known problem with valgrind on centos, and the InterProcessLocking test failures. I'm going to spend today cleaning up the set and making sure it's bisectable, and enhancing the new change_attr self tests to ensure that they cover all of the ways that the change_attr should be bumped.

Actions #13

Updated by Jeff Layton over 7 years ago

I have a PR up with a smaller set of changes here:

https://github.com/ceph/ceph/pull/10691

This is just changing how ",," is handled. I need to get this part merged (or NAK'ed) before I can send up the PR on the larger set. Reviews are welcome!

Actions #14

Updated by Jeff Layton over 7 years ago

Ok, smaller set of changes is now merged. Now we have the larger set to contend with. I've gone ahead and rolled some patches to make Samba use the new interfaces. That was fairly straightforward, but doing this for ganesha is a little tougher.

ganesha uses a lot of the ceph_ll_* interfaces, many of which fill out a struct stat, which is then used to populate their filehandle cache. I think what we'll probably want to do is to convert most of those calls to fill out a ceph_statx instead. It's not too hard to do, but it does explode the API a bit.

The saving grace there may be the fact that Greg is already going to be changing the API significantly to eliminate the use of -1 as a "special" uid. We can combine his changes to using a credential structure with the change to make the API use a ceph_statx structure.

Actions #15

Updated by Jeff Layton over 7 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

This should now be done for the most part. New ceph_statx interfaces have been merged into kraken. This change is not backward compatible though, so libcephfs has been revved to version 2 for kraken.

Actions #16

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (24)
Actions

Also available in: Atom PDF