Project

General

Profile

Bug #17594

cephfs: permission checking not working (MDS should enforce POSIX permissions)

Added by Jeff Layton about 3 years ago. Updated 6 months ago.

Status:
Verified
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
Start date:
10/17/2016
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
Client, MDS
Labels (FS):
Pull request ID:
Crash signature:

Description

Frank Filz noticed that cephfs doesn't seem to be enforcing permissions properly, particularly on mkdir.

This test program (written vs. the new cephfs API) runs to completion, but the second ceph_ll_mkdir call ought to fail.

ceph_mkdir_test.c View (1.43 KB) Jeff Layton, 10/17/2016 09:02 PM

ceph_mkdir_test.c View - "backported" reproducer (1.15 KB) Jeff Layton, 10/18/2016 03:23 PM


Related issues

Related to fs - Tasks #39998: client: audit ACL New

History

#1 Updated by John Spray about 3 years ago

Quick sanity check: is fuse_default_permissions set to false? Otherwise the assumption is that fuse (or nfs, samba) is doing the permissions check for us.

#2 Updated by Greg Farnum about 3 years ago

And at a guess, you might be using the uid,gid I presume you have there (0,0) to create folders with the ownership specified by the UserPerm. Based on my experience swapping things around it'd be easy to get that backwards (or maybe I did one wrong and missed it!).

...yeah, looking at master's ceph_ll_mkdir, it's taking the uid,gid you pass in and creating a UserPerm that is used for both the directory ownership and permission checking. Dunno what changes you made to (appropriately) split that functionality but I'm betting that's the root of the issue.

#3 Updated by Jeff Layton about 3 years ago

No, I don't think so. Here is ceph_userperm_new in my tree:

extern "C" UserPerm *ceph_userperm_new(uid_t uid, gid_t gid, int ngids,
                                       gid_t *gidlist)
{
  return new (std::nothrow) UserPerm(uid, gid, ngids, gidlist);
}

Pretty straightforward. The reproducer is definitely using unprivileged uids, and different ones for each mkdir. The directories are created with ownership that corresponds to the creds. It just looks like the permissions are not being enforced.

I messed around with the fuse_default_permissons setting too, but that didn't seem to make any difference. In any case, the MDS ought to be enforcing those permissions regardless of the client setting.

Also, Frank noticed this and he's not using a branch with my changes. Regardless, I'll see if I can "backport" this reproducer to work with the current master to verify that it wasn't anything I changed.

#4 Updated by Jeff Layton about 3 years ago

Ok, here's a "backported" test program. Both mkdirs still succeed, but when I set fuse_default_permissions to false on the client, permissions are enforced.

Still, doing that only on the client seems wrong. Shouldn't the MDS also be enforcing these permissions?

#5 Updated by Jeff Layton about 3 years ago

Ok, and the reason setting fuse_default_permissions didn't work for me yesterday was due to a typo. When I set that option properly on the client, then the client does enforce permissions.

Is that the intended design here? Should we be relying on the clients only to enforce permissions or should the MDS be doing that as well?

#6 Updated by Greg Farnum about 3 years ago

The MDS does enforce cephx-based permissions, but unless you've deliberately constructed a limited cephx key it's going to see the client is an admin and let it do whatever it wants.

#7 Updated by Jeff Layton about 3 years ago

I don't think that's sufficient.

My understanding is that once you drop the client_lock mutex (as the client does when waiting for the reply from the server), then any caps that client holds could be lost. At that point another client could race in and change the ownership or mode, rendering the client-side permission check invalid.

For "real" posix compliance you have to ensure that the permission check is atomic wrt the operation, and simply checking cephx permissions on the MDS isn't enough for that.

#8 Updated by Greg Farnum about 3 years ago

Luckily the messages are ordered so once a client sends off a request to the MDS, any caps it drops will be ordered after that operation. ;)

#9 Updated by Jeff Layton about 3 years ago

Will they be ordered after the reply comes in, or just ordered wrt how they go out onto the wire?

IOW, is there sometihng that will ensure that the caps can't be dropped until that reply is processed, or are we just counting on being lucky with the timing and that things will be processed on the MDS in the order received?

#10 Updated by Greg Farnum about 3 years ago

The client has to send out both a cap drop message and the user's request.
Those will be ordered, and if the cap drop message came first the client won't try and process the user request until it gets caps back.
On the MDS side, the messages will be "dispatched" in order and in the ordinary state of affairs will be processed in the order sent.

But, you're right, I'm not certain off-hand if we might block the user request message and re-order it behind the cap drop. This could be a hole, but it's not going to be a big one even if it's technically not allowed. (Amongst other things, it requires actual overlap between the two separate client requests — to change mode and to mkdir or whatever — and I believe POSIX lets us choose whatever ordering we want between simultaneous requests. They don't need to be processed in the same order they're acknowledged or anything.)

#11 Updated by Jeff Layton about 3 years ago

FWIW, "little holes" like this are worse in that they'll often strike when things are really redlined. If the race window is small, then that'll just make it that much harder to reproduce and fix.

I don't think POSIX says much about processing of simultaneous client requests on a server like this, but in general NFS and SMB servers and the like can process calls in whatever order they like. The ceph MDS is pretty similar in that regard, AFAICT.

If a client requires serialization like this, then I think we have to ensure that it takes steps to ensure it. Is there any way we can make the client take references on the necessary caps prior to issuing a MDS call, so we can ensure that it won't drop those caps until the reply comes in?

#12 Updated by Greg Farnum about 3 years ago

No, I mean: I don't think this counts as a bug. To get in this situation we have two outstanding requests:
1) chmod
2) something else

They are, by definition, overlapping operations. That means we get to choose the order. So what's the bug? Is there some order of replay operations under failure modes that breaks a contract?

#13 Updated by Jeff Layton about 3 years ago

No, the order has already been established, because each operation really starts with the permission check. But ok, my earlier example was a little ambiguous. Here's one that isn't:

Suppose we have a directory that is world-writable, but has the sticky bit set in the mode (mode 01777). The sticky bit on a directory basically says that you can't remove (or rename on top of) files that you don't own.

One client does an O_EXCL create (on behalf of uid 10001) of "testfile". He checks for the file's existence (it doesn't exist), does his permissions check and fires off a create call to the MDS. The MDS gets this call and then creates the file for that user.

Another client (at the same time) on behalf of uid 10002 does a rename of "foo" (which is owned by uid 10002) onto "testfile". The client checks for existence, doesn't see "testfile" (because it doesn't exist yet). It checks permissions and says OK because the directory is world-writable and there's nothing that needs to be removed. It fires off the rename call to the MDS.

The rename comes in just behind the create operation, removes the just-created "testfile" and replaces it with the one formerly known as "foo".

So now, the first client sees success on his O_EXCL create, but then when he goes to look it's owned by the wrong user (and has been clobbered).

The crux of the matter is that you can't separate the permissions check from the actual operation or you'll end up opening all sorts of ToC/ToU problems.

#14 Updated by Greg Farnum about 3 years ago

There we go, that case would concern me.

Contrary to our discussion in standup today, I think MDSAuthCaps::is_capable() is doing correct POSIX permission checking — see MDSAuthCaps::is_capable(). The problem in your case is just that we allow admin users (allow *) to do whatever they want (the "if (i->match.uid == MDSCapMatch::MDS_AUTH_UID_ANY) {" line) and that short-cuts the posix checks. But I think the changes we need can be done entirely in MDSAuthCaps code.

#15 Updated by Jeff Layton about 3 years ago

I don't quite understand. The reproducer that does the ceph_ll_mkdir does it as an unprivileged user, so why is it allowed by the MDS? IOW, why is uid 10002 considered to be an admin user?

Regardless, I think we ought to just have the MDS do the permission checking on all operations instead of trying to delegate that to the clients. While the situation I outlined is pretty clear, you can hit ToC/ToU races in other codepaths as well.

The exception there is if the client is doing the operation locally under the aegis of exclusive caps -- then it does need to check permissions locally since the MDS has delegated it to the client.

At the very least, we ought to change the default setting of fuse_default_permissions from true to false (and maybe rename it since it affects more than just FUSE).

#16 Updated by Greg Farnum about 3 years ago

The cephx user, not the *nix user. I'd be fine with the other stuff — we already run (all our?) nightlies with our own permission checking.

#17 Updated by John Spray about 3 years ago

  • Subject changed from cephfs permission checking not working to cephfs permission checking not working (MDS should enforce POSIX permissions)
  • Assignee deleted (Jeff Layton)

#18 Updated by John Spray about 3 years ago

  • Priority changed from Normal to High
  • Target version set to v12.0.0

#19 Updated by Greg Farnum almost 3 years ago

  • Assignee set to Greg Farnum

#20 Updated by Greg Farnum almost 3 years ago

  • Status changed from New to In Progress

#21 Updated by Greg Farnum almost 3 years ago

  • Status changed from In Progress to Verified
  • Assignee deleted (Greg Farnum)

Gah. I've run out of time to work on this right now. I've got a branch at :gregsfortytwo/ceph.git which makes a good start, but I've run into a few problems which will take more real work:
1) The MDS unconditionally returns EACCES when it denies permission, and in some cases (notably pjd chmod/07.t, test 10) the correct return is EPERM. (I didn't track down exactly why; it will matter.)
2) More importantly, we still aren't handling gid lists correctly and it is here biting us. I'm not sure if there are also FUSE interface issues causing trouble, but for certain we are sending a request in to change to group 65531, while acting as group 65532, on a request which sets it to both. We have no way to transmit a request for a list of them.

I'm not sure if there are other remaining issues but those are the ones I've found in the logs so far. If anybody wants to pick it up, running pjd on a local mount is a good way to test for the actual POSIX-compliance issues. If you're wondering why it's so bad after we put in the security stuff so long ago, well...it turns out we were never testing any of this because a cephx cap without any specific UID specified is allowed to do whatever it wants as far as MDSAuthCaps is concerned (as long as it's within an allowed path, obviously). Whoops!

#22 Updated by Jeff Layton almost 3 years ago

Greg Farnum wrote:

Gah. I've run out of time to work on this right now. I've got a branch at :gregsfortytwo/ceph.git which makes a good start, but I've run into a few problems which will take more real work:
1) The MDS unconditionally returns EACCES when it denies permission, and in some cases (notably pjd chmod/07.t, test 10) the correct return is EPERM. (I didn't track down exactly why; it will matter.)

I think that's just the way POSIX defines this:

http://pubs.opengroup.org/onlinepubs/009695399/functions/chmod.html

You may end up having to do EACCES -> EPERM translation at a fairly high level to get some of the error returns correct.

2) More importantly, we still aren't handling gid lists correctly and it is here biting us. I'm not sure if there are also FUSE interface issues causing trouble, but for certain we are sending a request in to change to group 65531, while acting as group 65532, on a request which sets it to both. We have no way to transmit a request for a list of them.

I do seem to recall that FUSE doesn't really handle grouplists either, but that may have changed more recently.

#23 Updated by Greg Farnum almost 3 years ago

Jeff Layton wrote:

Greg Farnum wrote:

Gah. I've run out of time to work on this right now. I've got a branch at :gregsfortytwo/ceph.git which makes a good start, but I've run into a few problems which will take more real work:
1) The MDS unconditionally returns EACCES when it denies permission, and in some cases (notably pjd chmod/07.t, test 10) the correct return is EPERM. (I didn't track down exactly why; it will matter.)

I think that's just the way POSIX defines this:

http://pubs.opengroup.org/onlinepubs/009695399/functions/chmod.html

You may end up having to do EACCES -> EPERM translation at a fairly high level to get some of the error returns correct.

Ah, yes, I meant I hadn't tracked down why the MDS was even involved in a way that made it return EACCES instead of having the client issue an EPERM earlier.

2) More importantly, we still aren't handling gid lists correctly and it is here biting us. I'm not sure if there are also FUSE interface issues causing trouble, but for certain we are sending a request in to change to group 65531, while acting as group 65532, on a request which sets it to both. We have no way to transmit a request for a list of them.

I do seem to recall that FUSE doesn't really handle grouplists either, but that may have changed more recently.

Yeah, just from glancing at our implementation that certainly was and may still be the case, but these tests pass in master (at least, with our default cephx settings).

#24 Updated by Patrick Donnelly over 1 year ago

  • Subject changed from cephfs permission checking not working (MDS should enforce POSIX permissions) to cephfs: permission checking not working (MDS should enforce POSIX permissions)
  • Category set to Correctness/Safety
  • Assignee set to Jeff Layton
  • Target version changed from v12.0.0 to v13.0.0
  • Source changed from other to Development
  • Backport set to luminous
  • ceph-qa-suite fs added
  • Component(FS) Client, MDS added

#25 Updated by Patrick Donnelly over 1 year ago

  • Target version changed from v13.0.0 to v14.0.0
  • Backport changed from luminous to mimic,luminous

#26 Updated by Jeff Layton 9 months ago

Dusting this bug off after quite some time. I'm trying to understand what it will take to get the MDS to enforce permissions here. I've got a vstart cluster set up and the client is set up with fuse_default_permissions = true.

It seems that no matter what cephx user I have the test program connect as, I find that the match.uid compared in is_capable is '-1' (which is MDSCapMatch::MDS_AUTH_UID_ANY), which bypasses all of the permissions checks. How do I craft a cephx user that has a match.uid that is different?

#28 Updated by Jeff Layton 9 months ago

Thanks, Zheng! That worked. For the record, based on my testcase, I did:

ceph auth caps client.test mon 'allow r' mds 'allow r, allow rw uid=10000 gids=10000, allow rw uid=10001 gids=10001' osd 'allow rw'

...and with that the MDS enforced the permissions as expected. Still, that doesn't seem very practical with a client like ganesha that might be issuing calls on behalf of many uid/gid combinations.

ISTM that we ought to enforce those permissions on the MDS regardless of the uid=/gids= settings, for any cephx user that doesn't have specific caps that allow overriding permission checks. Maybe we need to overhaul the mds caps to do this?

#29 Updated by Jeff Layton 9 months ago

To elaborate, today the rule looks something like this:

"If there is a uid= qualifier in the cap, and the call is made on behalf of that uid then allow it and do the needful directory access checks (DAC) for the call. If there is no uid= qualifier then assume that the client checked the permissions and don't do any DAC."

That's problematic though as you'd need to enumerate all potential uid/gid combos in order to get the MDS to do any DAC. Instead, I think we need a new flag (D?) in the RWPS set that grants the ability for that cephx user to bypass the DAC. So, you'd have 3 possible states:

  • No uid filtering + do DAC checks on calls coming in (when new D flag is clear and no uid= filters are set on the cap)
  • Filter uids + do DAC checks on those users (when new D flag is clear and uid= filters are set on the cap)
  • Superuser permissions (new D flag is set -- this is the equivalent of MDS_AUTH_UID_ANY today)

#30 Updated by Patrick Donnelly 9 months ago

Jeff Layton wrote:

No, the order has already been established, because each operation really starts with the permission check. But ok, my earlier example was a little ambiguous. Here's one that isn't:

Suppose we have a directory that is world-writable, but has the sticky bit set in the mode (mode 01777). The sticky bit on a directory basically says that you can't remove (or rename on top of) files that you don't own.

One client does an O_EXCL create (on behalf of uid 10001) of "testfile". He checks for the file's existence (it doesn't exist), does his permissions check and fires off a create call to the MDS. The MDS gets this call and then creates the file for that user.

Another client (at the same time) on behalf of uid 10002 does a rename of "foo" (which is owned by uid 10002) onto "testfile". The client checks for existence, doesn't see "testfile" (because it doesn't exist yet). It checks permissions and says OK because the directory is world-writable and there's nothing that needs to be removed. It fires off the rename call to the MDS.

The rename comes in just behind the create operation, removes the just-created "testfile" and replaces it with the one formerly known as "foo".

So now, the first client sees success on his O_EXCL create, but then when he goes to look it's owned by the wrong user (and has been clobbered).

The crux of the matter is that you can't separate the permissions check from the actual operation or you'll end up opening all sorts of ToC/ToU problems.

Is this a problem particular to rename? It seems to me that one reasonable approach to resolve this would be to require that the client obtain caps on the target directory (EXCL?) for permission check purposes before doing the rename RPC.

#31 Updated by Jeff Layton 9 months ago

No. This problem is not specific to rename -- I just used that as an example. The upshot here is that you cannot separate DAC from the actual operation that you're going to do.

Unless you can guarantee that the client will always hold the necessary caps to prevent changes to inodes that govern DAC and to the relevant dentry namespace, those checks will be racy. I don't think we can make such a guarantee, and I think we ought to be doing DAC checks on almost every operation. We probably could avoid those when we do know that the client holds the correct caps for that however.

Beyond that though, the existing semantics for the uid= qualifier on auth caps are somewhat odd. It's not clear to me what the use case for them actually is...

#32 Updated by Patrick Donnelly 9 months ago

Jeff Layton wrote:

To elaborate, today the rule looks something like this:

"If there is a uid= qualifier in the cap, and the call is made on behalf of that uid then allow it and do the needful directory access checks (DAC) for the call. If there is no uid= qualifier then assume that the client checked the permissions and don't do any DAC."

That's problematic though as you'd need to enumerate all potential uid/gid combos in order to get the MDS to do any DAC.

Why would there be more than one uid/gid pair for a client request?

Instead, I think we need a new flag (D?) in the RWPS set that grants the ability for that cephx user to bypass the DAC. So, you'd have 3 possible states:

  • No uid filtering + do DAC checks on calls coming in (when new D flag is clear and no uid= filters are set on the cap)
  • Filter uids + do DAC checks on those users (when new D flag is clear and uid= filters are set on the cap)
  • Superuser permissions (new D flag is set -- this is the equivalent of MDS_AUTH_UID_ANY today)

I'm iffy on whether we want to go down this road. CephFS is built on trustworthy clients so modifying the caps to skip permission checks is may superfluous. (We do have RADOS namespaces and path= MDSAuthCap restrictions for clients we really want to segregate.)

#33 Updated by Patrick Donnelly 9 months ago

Jeff Layton wrote:

No. This problem is not specific to rename -- I just used that as an example.

Sorry can you give another example then? This problem seems very specific to sticky directories and rename.

The upshot here is that you cannot separate DAC from the actual operation that you're going to do.

Unless you can guarantee that the client will always hold the necessary caps to prevent changes to inodes that govern DAC and to the relevant dentry namespace, those checks will be racy. I don't think we can make such a guarantee, and I think we ought to be doing DAC checks on almost every operation. We probably could avoid those when we do know that the client holds the correct caps for that however.

I think that's the key though. The client should be doing these checks and to do that, it needs to have coherent cache for the directory it's renaming the file to so it doesn't accidentally clobber someone else's file. So, I would suggest the right fix here is to fix the client to request those caps.

#34 Updated by Jeff Layton 9 months ago

Why would there be more than one uid/gid pair for a client request?

There wouldn't be, but a client like ganesha may be handling requests on behalf of many different POSIX users. We can't reasonably set a uid= list in the MDS auth cap for such clients.

Sorry can you give another example then? This problem seems very specific to sticky directories and rename.

I'll have to think about it. Many of these sorts of races do involve namespace operations (renames, links and unlinks). There are also potential races with things like chmod'ing a directory with restrictive permissions and then putting "secret" files in there. If that chmod races in between the DAC and the file open, then you've potentially violated POSIX semantics as well.

I think that's the key though. The client should be doing these checks and to do that, it needs to have coherent cache for the directory it's renaming the file to so it doesn't accidentally clobber someone else's file. So, I would suggest the right fix here is to fix the client to request those caps.

In practice, I suspect the client usually does have those caps when it goes to do this. We could request them when we don't but my understanding is that you are never guaranteed to get caps that you request. What will we do in the case where we don't?

Also, bouncing caps around for this is not necessarily the most efficient thing to do either. If I just want to rename a file in a directory, then I have make at least 2 round trips (request caps, do rename, release caps), vs. 1 round trip if we just let the MDS handle it.

#35 Updated by Patrick Donnelly 8 months ago

  • Target version changed from v14.0.0 to v15.0.0

#36 Updated by Patrick Donnelly 6 months ago

#37 Updated by Jeff Layton 6 months ago

Reconfirming that I think this is a problem. Here's Client::mkdir():

int Client::mkdir(const char *relpath, mode_t mode, const UserPerm& perm)
{ 
  std::lock_guard lock(client_lock);
  tout(cct) << __func__ << std::endl;
  tout(cct) << relpath << std::endl;
  tout(cct) << mode << std::endl;
  ldout(cct, 10) << __func__ << ": " << relpath << dendl;

  if (unmounting)
    return -ENOTCONN;

  if (std::string(relpath) == "/")
    return -EEXIST;

  filepath path(relpath);
  string name = path.last_dentry();
  path.pop_dentry();
  InodeRef dir;
  int r = path_walk(path, &dir, perm);
  if (r < 0)
    return r;
  if (cct->_conf->client_permissions) {
    r = may_create(dir.get(), perm);
    if (r < 0)
      return r;
  }
  return _mkdir(dir.get(), name.c_str(), mode, perm);
}

There is a potential race window between may_create and _mkdir call, during which permissions could change such that the mkdir should be denied. We don't ensure that (e.g.) As is held there (or Xs if there is an associated POSIX ACL).

The MDS doesn't seem to universally check permissions, but I think it should. It will check them in some cases (when there is a uid= qualifier in the cephx caps). I really don't grok the model of how we intend to map POSIX uid/gids onto cephx caps.

IMNSHO, we'd be better off just using MDS caps to grant whether a client has access to do certain operations on the MDS, and also have it check POSIX permissions and ACLs vs. the caller_uid/caller_gid for basically all calls.

#38 Updated by Frank Filz 6 months ago

Jeff Layton wrote:

Reconfirming that I think this is a problem. Here's Client::mkdir():

[...]

There is a potential race window between may_create and _mkdir call, during which permissions could change such that the mkdir should be denied. We don't ensure that (e.g.) As is held there (or Xs if there is an associated POSIX ACL).

The MDS doesn't seem to universally check permissions, but I think it should. It will check them in some cases (when there is a uid= qualifier in the cephx caps). I really don't grok the model of how we intend to map POSIX uid/gids onto cephx caps.

IMNSHO, we'd be better off just using MDS caps to grant whether a client has access to do certain operations on the MDS, and also have it check POSIX permissions and ACLs vs. the caller_uid/caller_gid for basically all calls.

Permission checking of manipulating directories (create, delete, rename) is always tricky. The permission check has to be done at a point where locks can be taken to assure the permission check and the directory manipulation are atomic.

Also available in: Atom PDF