Project

General

Profile

Bug #17594

cephfs: permission checking not working (MDS should enforce POSIX permissions)

Added by Jeff Layton about 2 years ago. Updated 7 months ago.

Status:
Verified
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
Start date:
10/17/2016
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
Client, MDS
Labels (FS):
Pull request ID:

Description

Frank Filz noticed that cephfs doesn't seem to be enforcing permissions properly, particularly on mkdir.

This test program (written vs. the new cephfs API) runs to completion, but the second ceph_ll_mkdir call ought to fail.

ceph_mkdir_test.c View (1.43 KB) Jeff Layton, 10/17/2016 09:02 PM

ceph_mkdir_test.c View - "backported" reproducer (1.15 KB) Jeff Layton, 10/18/2016 03:23 PM

History

#1 Updated by John Spray about 2 years ago

Quick sanity check: is fuse_default_permissions set to false? Otherwise the assumption is that fuse (or nfs, samba) is doing the permissions check for us.

#2 Updated by Greg Farnum about 2 years ago

And at a guess, you might be using the uid,gid I presume you have there (0,0) to create folders with the ownership specified by the UserPerm. Based on my experience swapping things around it'd be easy to get that backwards (or maybe I did one wrong and missed it!).

...yeah, looking at master's ceph_ll_mkdir, it's taking the uid,gid you pass in and creating a UserPerm that is used for both the directory ownership and permission checking. Dunno what changes you made to (appropriately) split that functionality but I'm betting that's the root of the issue.

#3 Updated by Jeff Layton about 2 years ago

No, I don't think so. Here is ceph_userperm_new in my tree:

extern "C" UserPerm *ceph_userperm_new(uid_t uid, gid_t gid, int ngids,
                                       gid_t *gidlist)
{
  return new (std::nothrow) UserPerm(uid, gid, ngids, gidlist);
}

Pretty straightforward. The reproducer is definitely using unprivileged uids, and different ones for each mkdir. The directories are created with ownership that corresponds to the creds. It just looks like the permissions are not being enforced.

I messed around with the fuse_default_permissons setting too, but that didn't seem to make any difference. In any case, the MDS ought to be enforcing those permissions regardless of the client setting.

Also, Frank noticed this and he's not using a branch with my changes. Regardless, I'll see if I can "backport" this reproducer to work with the current master to verify that it wasn't anything I changed.

#4 Updated by Jeff Layton about 2 years ago

Ok, here's a "backported" test program. Both mkdirs still succeed, but when I set fuse_default_permissions to false on the client, permissions are enforced.

Still, doing that only on the client seems wrong. Shouldn't the MDS also be enforcing these permissions?

#5 Updated by Jeff Layton about 2 years ago

Ok, and the reason setting fuse_default_permissions didn't work for me yesterday was due to a typo. When I set that option properly on the client, then the client does enforce permissions.

Is that the intended design here? Should we be relying on the clients only to enforce permissions or should the MDS be doing that as well?

#6 Updated by Greg Farnum about 2 years ago

The MDS does enforce cephx-based permissions, but unless you've deliberately constructed a limited cephx key it's going to see the client is an admin and let it do whatever it wants.

#7 Updated by Jeff Layton about 2 years ago

I don't think that's sufficient.

My understanding is that once you drop the client_lock mutex (as the client does when waiting for the reply from the server), then any caps that client holds could be lost. At that point another client could race in and change the ownership or mode, rendering the client-side permission check invalid.

For "real" posix compliance you have to ensure that the permission check is atomic wrt the operation, and simply checking cephx permissions on the MDS isn't enough for that.

#8 Updated by Greg Farnum about 2 years ago

Luckily the messages are ordered so once a client sends off a request to the MDS, any caps it drops will be ordered after that operation. ;)

#9 Updated by Jeff Layton about 2 years ago

Will they be ordered after the reply comes in, or just ordered wrt how they go out onto the wire?

IOW, is there sometihng that will ensure that the caps can't be dropped until that reply is processed, or are we just counting on being lucky with the timing and that things will be processed on the MDS in the order received?

#10 Updated by Greg Farnum about 2 years ago

The client has to send out both a cap drop message and the user's request.
Those will be ordered, and if the cap drop message came first the client won't try and process the user request until it gets caps back.
On the MDS side, the messages will be "dispatched" in order and in the ordinary state of affairs will be processed in the order sent.

But, you're right, I'm not certain off-hand if we might block the user request message and re-order it behind the cap drop. This could be a hole, but it's not going to be a big one even if it's technically not allowed. (Amongst other things, it requires actual overlap between the two separate client requests — to change mode and to mkdir or whatever — and I believe POSIX lets us choose whatever ordering we want between simultaneous requests. They don't need to be processed in the same order they're acknowledged or anything.)

#11 Updated by Jeff Layton about 2 years ago

FWIW, "little holes" like this are worse in that they'll often strike when things are really redlined. If the race window is small, then that'll just make it that much harder to reproduce and fix.

I don't think POSIX says much about processing of simultaneous client requests on a server like this, but in general NFS and SMB servers and the like can process calls in whatever order they like. The ceph MDS is pretty similar in that regard, AFAICT.

If a client requires serialization like this, then I think we have to ensure that it takes steps to ensure it. Is there any way we can make the client take references on the necessary caps prior to issuing a MDS call, so we can ensure that it won't drop those caps until the reply comes in?

#12 Updated by Greg Farnum about 2 years ago

No, I mean: I don't think this counts as a bug. To get in this situation we have two outstanding requests:
1) chmod
2) something else

They are, by definition, overlapping operations. That means we get to choose the order. So what's the bug? Is there some order of replay operations under failure modes that breaks a contract?

#13 Updated by Jeff Layton about 2 years ago

No, the order has already been established, because each operation really starts with the permission check. But ok, my earlier example was a little ambiguous. Here's one that isn't:

Suppose we have a directory that is world-writable, but has the sticky bit set in the mode (mode 01777). The sticky bit on a directory basically says that you can't remove (or rename on top of) files that you don't own.

One client does an O_EXCL create (on behalf of uid 10001) of "testfile". He checks for the file's existence (it doesn't exist), does his permissions check and fires off a create call to the MDS. The MDS gets this call and then creates the file for that user.

Another client (at the same time) on behalf of uid 10002 does a rename of "foo" (which is owned by uid 10002) onto "testfile". The client checks for existence, doesn't see "testfile" (because it doesn't exist yet). It checks permissions and says OK because the directory is world-writable and there's nothing that needs to be removed. It fires off the rename call to the MDS.

The rename comes in just behind the create operation, removes the just-created "testfile" and replaces it with the one formerly known as "foo".

So now, the first client sees success on his O_EXCL create, but then when he goes to look it's owned by the wrong user (and has been clobbered).

The crux of the matter is that you can't separate the permissions check from the actual operation or you'll end up opening all sorts of ToC/ToU problems.

#14 Updated by Greg Farnum about 2 years ago

There we go, that case would concern me.

Contrary to our discussion in standup today, I think MDSAuthCaps::is_capable() is doing correct POSIX permission checking — see MDSAuthCaps::is_capable(). The problem in your case is just that we allow admin users (allow *) to do whatever they want (the "if (i->match.uid == MDSCapMatch::MDS_AUTH_UID_ANY) {" line) and that short-cuts the posix checks. But I think the changes we need can be done entirely in MDSAuthCaps code.

#15 Updated by Jeff Layton about 2 years ago

I don't quite understand. The reproducer that does the ceph_ll_mkdir does it as an unprivileged user, so why is it allowed by the MDS? IOW, why is uid 10002 considered to be an admin user?

Regardless, I think we ought to just have the MDS do the permission checking on all operations instead of trying to delegate that to the clients. While the situation I outlined is pretty clear, you can hit ToC/ToU races in other codepaths as well.

The exception there is if the client is doing the operation locally under the aegis of exclusive caps -- then it does need to check permissions locally since the MDS has delegated it to the client.

At the very least, we ought to change the default setting of fuse_default_permissions from true to false (and maybe rename it since it affects more than just FUSE).

#16 Updated by Greg Farnum about 2 years ago

The cephx user, not the *nix user. I'd be fine with the other stuff — we already run (all our?) nightlies with our own permission checking.

#17 Updated by John Spray about 2 years ago

  • Subject changed from cephfs permission checking not working to cephfs permission checking not working (MDS should enforce POSIX permissions)
  • Assignee deleted (Jeff Layton)

#18 Updated by John Spray about 2 years ago

  • Priority changed from Normal to High
  • Target version set to v12.0.0

#19 Updated by Greg Farnum almost 2 years ago

  • Assignee set to Greg Farnum

#20 Updated by Greg Farnum almost 2 years ago

  • Status changed from New to In Progress

#21 Updated by Greg Farnum almost 2 years ago

  • Status changed from In Progress to Verified
  • Assignee deleted (Greg Farnum)

Gah. I've run out of time to work on this right now. I've got a branch at :gregsfortytwo/ceph.git which makes a good start, but I've run into a few problems which will take more real work:
1) The MDS unconditionally returns EACCES when it denies permission, and in some cases (notably pjd chmod/07.t, test 10) the correct return is EPERM. (I didn't track down exactly why; it will matter.)
2) More importantly, we still aren't handling gid lists correctly and it is here biting us. I'm not sure if there are also FUSE interface issues causing trouble, but for certain we are sending a request in to change to group 65531, while acting as group 65532, on a request which sets it to both. We have no way to transmit a request for a list of them.

I'm not sure if there are other remaining issues but those are the ones I've found in the logs so far. If anybody wants to pick it up, running pjd on a local mount is a good way to test for the actual POSIX-compliance issues. If you're wondering why it's so bad after we put in the security stuff so long ago, well...it turns out we were never testing any of this because a cephx cap without any specific UID specified is allowed to do whatever it wants as far as MDSAuthCaps is concerned (as long as it's within an allowed path, obviously). Whoops!

#22 Updated by Jeff Layton almost 2 years ago

Greg Farnum wrote:

Gah. I've run out of time to work on this right now. I've got a branch at :gregsfortytwo/ceph.git which makes a good start, but I've run into a few problems which will take more real work:
1) The MDS unconditionally returns EACCES when it denies permission, and in some cases (notably pjd chmod/07.t, test 10) the correct return is EPERM. (I didn't track down exactly why; it will matter.)

I think that's just the way POSIX defines this:

http://pubs.opengroup.org/onlinepubs/009695399/functions/chmod.html

You may end up having to do EACCES -> EPERM translation at a fairly high level to get some of the error returns correct.

2) More importantly, we still aren't handling gid lists correctly and it is here biting us. I'm not sure if there are also FUSE interface issues causing trouble, but for certain we are sending a request in to change to group 65531, while acting as group 65532, on a request which sets it to both. We have no way to transmit a request for a list of them.

I do seem to recall that FUSE doesn't really handle grouplists either, but that may have changed more recently.

#23 Updated by Greg Farnum almost 2 years ago

Jeff Layton wrote:

Greg Farnum wrote:

Gah. I've run out of time to work on this right now. I've got a branch at :gregsfortytwo/ceph.git which makes a good start, but I've run into a few problems which will take more real work:
1) The MDS unconditionally returns EACCES when it denies permission, and in some cases (notably pjd chmod/07.t, test 10) the correct return is EPERM. (I didn't track down exactly why; it will matter.)

I think that's just the way POSIX defines this:

http://pubs.opengroup.org/onlinepubs/009695399/functions/chmod.html

You may end up having to do EACCES -> EPERM translation at a fairly high level to get some of the error returns correct.

Ah, yes, I meant I hadn't tracked down why the MDS was even involved in a way that made it return EACCES instead of having the client issue an EPERM earlier.

2) More importantly, we still aren't handling gid lists correctly and it is here biting us. I'm not sure if there are also FUSE interface issues causing trouble, but for certain we are sending a request in to change to group 65531, while acting as group 65532, on a request which sets it to both. We have no way to transmit a request for a list of them.

I do seem to recall that FUSE doesn't really handle grouplists either, but that may have changed more recently.

Yeah, just from glancing at our implementation that certainly was and may still be the case, but these tests pass in master (at least, with our default cephx settings).

#24 Updated by Patrick Donnelly 8 months ago

  • Subject changed from cephfs permission checking not working (MDS should enforce POSIX permissions) to cephfs: permission checking not working (MDS should enforce POSIX permissions)
  • Category set to Correctness/Safety
  • Assignee set to Jeff Layton
  • Target version changed from v12.0.0 to v13.0.0
  • Source changed from other to Development
  • Backport set to luminous
  • ceph-qa-suite fs added
  • Component(FS) Client, MDS added

#25 Updated by Patrick Donnelly 7 months ago

  • Target version changed from v13.0.0 to v14.0.0
  • Backport changed from luminous to mimic,luminous

Also available in: Atom PDF