Project

General

Profile

Bug #18131

ceph-fuse not clearing setuid/setgid bits on chown

Added by Jeff Layton 11 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
-
Start date:
12/02/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Release:
Component(FS):
Needs Doc:
No

Description

I had some test failures that showed up in my most recent fs suite run here:

http://pulpito.ceph.com/jlayton-2016-11-30_18:27:59-fs-wip-jlayton-umount---basic-smithi/

The problem seems to be that we're not clearing setuid/setgid bits on ownership changes.

The branch being tested was basically mainline with a few small patches that shouldn't affect this.


Related issues

Copied to fs - Backport #18308: ceph-fuse not clearing setuid/setgid bits on chown Resolved

History

#1 Updated by Jeff Layton 11 months ago

  • Assignee deleted (Jeff Layton)

#2 Updated by Jeff Layton 11 months ago

The tests I'm mainly concered with are the pjd.sh tests in those. The rest seem to be transient failures of one sort or another.

#3 Updated by John Spray 11 months ago

  • Priority changed from Normal to Immediate

#4 Updated by Jeff Layton 11 months ago

  • Assignee set to Jeff Layton

First evidence I see of this test failing is here:

http://pulpito.ceph.com/teuthology-2016-11-30_10:10:02-fs-jewel---basic-smithi/

But this is a jewel branch, so I have to wonder if something changed in the testsuite or maybe in the kernel?

Regardless, this is easily reproducible by hand with FUSE on my main development box:

[jlayton@tleilax build]$ sudo ./bin/ceph-fuse /mnt/cephfs
2016-12-05 10:27:01.720307 7f40dbb36f40 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-12-05 10:27:01.720498 7f40dbb36f40 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-12-05 10:27:01.723444 7f40dbb36f40 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-12-05 10:27:01.724777 7f40dbb36f40 -1 init, newargv = 0x5640db141260 newargc=11
ceph-fuse[9192]: starting ceph client
ceph-fuse[9192]: starting fuse
Aborted
[jlayton@tleilax build]$ cd /mnt/cephfs
[jlayton@tleilax cephfs]$ touch ./testfile
[jlayton@tleilax cephfs]$ chmod 06555 testfile
[jlayton@tleilax cephfs]$ stat testfile
  File: 'testfile'
  Size: 0             Blocks: 0          IO Block: 4194304 regular empty file
Device: 2fh/47d    Inode: 1099511627781  Links: 1
Access: (6555/-r-sr-sr-x)  Uid: ( 4447/ jlayton)   Gid: ( 4447/ jlayton)
Context: system_u:object_r:fusefs_t:s0
Access: 2016-12-05 10:28:04.949383461 -0500
Modify: 2016-12-05 10:28:04.949383461 -0500
Change: 2016-12-05 10:28:14.268422691 -0500
 Birth: -
[jlayton@tleilax cephfs]$ sudo chown root:root testfile
[jlayton@tleilax cephfs]$ stat testfile
  File: 'testfile'
  Size: 0             Blocks: 0          IO Block: 4194304 regular empty file
Device: 2fh/47d    Inode: 1099511627781  Links: 1
Access: (6555/-r-sr-sr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:fusefs_t:s0
Access: 2016-12-05 10:28:04.949383461 -0500
Modify: 2016-12-05 10:28:04.949383461 -0500
Change: 2016-12-05 10:28:33.884275793 -0500
 Birth: -
[jlayton@tleilax cephfs]$ 

...no mode change.

The kernel client seems to be fine.

Ah-ha...I think this might be relevant (kernel patch):

commit 5e940c1dd3c1f7561924954eecee956ec277a79b
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Sat Oct 1 07:32:32 2016 +0200

    fuse: handle killpriv in userspace fs

    Only userspace filesystem can do the killing of suid/sgid without races.
    So introduce an INIT flag and negotiate support for this.

    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Prior to this patch, the kernel would do this for us.

#5 Updated by Jeff Layton 11 months ago

Sorry, I mean this patch in my v4.8.10 kernel:

commit 0e2993ef5bfbca56949474942c2f6b31417e3d90
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Sat Oct 1 07:32:32 2016 +0200

    fuse: fix killing s[ug]id in setattr

    commit a09f99eddef44035ec764075a37bace8181bec38 upstream.

    Fuse allowed VFS to set mode in setattr in order to clear suid/sgid on
    chown and truncate, and (since writeback_cache) write.  The problem with
    this is that it'll potentially restore a stale mode.

    The poper fix would be to let the filesystems do the suid/sgid clearing on
    the relevant operations.  Possibly some are already doing it but there's no
    way we can detect this.

    So fix this by refreshing and recalculating the mode.  Do this only if
    ATTR_KILL_S[UG]ID is set to not destroy performance for writes.  This is
    still racy but the size of the window is reduced.

    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

#6 Updated by Jeff Layton 11 months ago

Definitely a kernel problem. Works correctly on v4.8.0, broken on v4.8.6 kernel for sure. I haven't bisected to be sure, but mainline commit a09f99eddef44035ec764075a37bace8181bec38 was taken into stable between those kernels.

Bisecting confirms that that is the patch where it breaks for me. John says he can reproduce on an ubuntu 4.4 kernel, and I'm trying to confirm now whether that might have this patch.

#7 Updated by Jeff Layton 11 months ago

Ahh, I think I see the problem:

+               attr->ia_mode = inode->i_mode;
+               kill = should_remove_suid(entry);
+               if (kill & ATTR_KILL_SUID) {
+                       attr->ia_valid |= ATTR_MODE;
+                       attr->ia_mode &= ~S_ISUID;
+               }
+               if (kill & ATTR_KILL_SGID) {
+                       attr->ia_valid |= ATTR_MODE;
+                       attr->ia_mode &= ~S_ISGID;
+               }

...and should_remove_suid() is returning 0 here. The reason for that is because the test is running as root.

So basically, pjdtests is saying that when root chown/chgrps a file, that the setuid/setgid bits must be cleared. POSIX however is somewhat less definitive on the subject. It just says:

"The POSIX.1-1990 standard requires that the chown() function invoked by a non-appropriate privileged process clear the S_ISGID and the S_ISUID bits for regular files"

...which is strange verbiage, but I think just means "a non-privileged process". In Linux terms, I think that just means a process without CAP_FSETID.

IOW, if you chgrp as a non-privileged user then you have to clear the bits. If you do so as root, then you're allowed to clear the bits, but aren't required to do so. The kernel always tries to clear them on chown, even for root.

The fuse client used to clear them, but now it doesn't. The problem I think is the new use of should_remove_suid in this codepath. That's typically used in write/truncate codepath, but for chown, I think we want to do this, even for root.

#8 Updated by Jeff Layton 11 months ago

So the upshot here is that I don't think this is anything that has broken in ceph, per-se. FUSE changed its behavior, and I'm pursuing getting FUSE fixed on upstream. As best I can tell, ceph has just relied on the kernel to do this.

On the ceph userland front, I think we ought to implement this in the setattr codepath as well. When we have a a uid/gid change without a corresponding mode change in the setattr request, and the setuid/setgid bits are set, then we should clear those bits as well. Might not hurt to do this on write and truncate as well.

Note that we'll need to fix this both in the client and MDS.

#9 Updated by Jeff Layton 11 months ago

Testing a PR now that fixes the setattr codepaths. That's pretty simple to do from those codepaths since we're sending a setattr anyway.

The write path is a little harder. We may not have any auth caps at all and doing a read/modify/write cycle is racy if we can't get them.

One idea might be to declare a new CEPH_SETATTR_KILLSUGID flag that we can send in a SETATTR request that would just do the bit clearing without requiring a read/modify/write cycle. That would work and should close the race windows (I think).

Another idea (more radical) might be to have the MDS clear the bits when it hands out file write caps, and recall any write caps when someone tries to set those bits. That would be more strict than what posix requires. It's also possibly more strict than what it allows as we'd end up clearing those bits when the file was opened for write. Still, that might be worthwhile -- generally you don't want files with these bits set being modified anyway.

Another thing we need to sort out is that the write (and truncate) paths are different in that you're not required to clear those bits if you have "appropriate privileges". For Linux, that's CAP_FSETID. I'm not sure how to interpret that in ceph-land. For now, we'll just clear it for root as well. I think that's a sane thing to do.

#10 Updated by Jeff Layton 11 months ago

Merged the chown part of this, and I think I have sorted out the problems I was having with the truncate and write codepaths. I'll be testing those later today. Basically, what I'm doing to do for write and truncate is adding a new CEPH_SETATTR_KILL_SGUID flag. When that's passed to the mds, it will grab Ax caps and clear the bits. That should ensure that we can't race during a read/modify/write cycle of the mode.

For write, we grab As when we get Fw caps and check the mode. If either bit is set, then we issue a __setattr to ask it to clear them. If we have Ax caps as well, then we'll cache the mode change. Otherwise we issue a setattr to the server for CEPH_SETATTR_KILL_SGUID, to have it clear the mode.

For CEPH_SETATTR_SIZE, we do the same -- cache the mode change if we have Ax caps, or ask the server to clear it for us if not.

#11 Updated by Jeff Layton 10 months ago

  • Priority changed from Immediate to Normal

I'll lower the priority to Normal now.

Ok, this should be fixed in mainline kernels and coming to stable series kernels. We've also got kraken set up to handle broken kernels properly.

We should probably consider backporting that fix to earlier releases as well (jewel, at least?).

#12 Updated by Jeff Layton 10 months ago

  • Subject changed from cephfs not clearing setuid/setgid bits on chown to ceph-fuse not clearing setuid/setgid bits on chown

#13 Updated by Jeff Layton 10 months ago

  • Status changed from New to Pending Backport
  • Release jewel added

#14 Updated by Jeff Layton 10 months ago

  • Release deleted (jewel)

#15 Updated by Jeff Layton 10 months ago

  • Copied to Backport #18308: ceph-fuse not clearing setuid/setgid bits on chown added

#16 Updated by Nathan Cutler 10 months ago

  • Backport set to jewel

#17 Updated by Nathan Cutler 10 months ago

@Jeff: Which commit fixes the issue/should be backported to jewel?

#19 Updated by Jeff Layton 10 months ago

Sorry for the late response. At the very least, we need these commits:


commit 18d2499d6c85a10b4b54f3b8c335cddf86c4588f
Author: Jeff Layton <jlayton@redhat.com>
Date:   Mon Dec 5 14:11:44 2016 -0500

    client: drop setuid/setgid bits on ownership change

commit 6da72500882d9749cb2be6eaa2568e6fe6e5ff4d
Author: Jeff Layton <jlayton@redhat.com>
Date:   Mon Dec 5 14:19:23 2016 -0500

    mds: clear setuid/setgid bits on ownership changes

That should ensure that we clear the setuid/setgid bits on ownership changeseven when a buggy kernel is in use. The other cases are still handled correctly by the kernel so I don't see a pressing need for them in jewel.

#20 Updated by Nathan Cutler 9 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF