Bug #62217
closedceph_fs.h: add separate owner_{u,g}id fields
100%
Description
This task is about adding separate fields to pass inode owner's UID/GID for operations which create new inodes:
CEPH_MDS_OP_CREATE, CEPH_MDS_OP_MKNOD, CEPH_MDS_OP_MKDIR, CEPH_MDS_OP_SYMLINK.
Before we have used caller_{u,g}id fields for two purposes:
- for MDS permission checks
- to set UID/GID for new inodes
but during a long discussion around adding idmapped mounts support to cephfs kernel client we decided to extend wire protocol [1].
Originally, Christian Brauner pointed to this problem [2], but a few initial versions of "ceph: support idmapped mounts" patchset were based on a compromise between bringing idmapped mounts support and changing on-wire data structures. The only problem with this approach is that if we have UID/GID-based permission checks on the MDS side they won't work as intended. Xiubo Li pointed to this issue and said that it's critical enough to not being ignored.
This patch does not change behavior for client or MDS, because in the current client implementation we do:
req->set_inode_owner_uid_gid(perms.uid(), perms.gid());
so, these new fields contain the same value as caller_{u,g}id.
This new extension is used in a non-trivial way in the Linux kernel client:
https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph
In addition to new fields we also add CEPHFS_FEATURE_HAS_OWNER_UIDGID feature bit. It can be used on the client-side
to check if MDS supports these new fields and can handle them properly.
Example (how idmapped mounts work) [4]:
1. Mount cephfs mount.ceph admin@XYZ.cephfs=/ /mnt/ceph -o mon_addr=127.0.0.1:6789,secret=very_secret_key 2. Make 1000:1000 a root dentry owner (it will be convenient because we want to use mapping 1000:0:1 for simplicity) chown 1000:1000 /mnt/ceph 3. create an idmapped mount based on a regular /mnt/ceph mount using a mount-idmapped tool that was written by Christian. [ taken from https://raw.githubusercontent.com/brauner/mount-idmapped/master/mount-idmapped.c ] ./mount-idmapped --map-mount b:1000:0:1 /mnt/ceph /mnt/ceph_idmapped "b" stands for "both", so we are creating a mapping of length 1 for both UID and GID. 1000 is a UID/GID "on-disk", 0 is a mapped UID/GID. 4. Just to be precise, let's look at which UID/GID we have now. root@ubuntu:/home/ubuntu# ls -lan /mnt/ceph total 4 drwxrwxrwx 2 1000 1000 0 Jun 1 17:51 . drwxr-xr-x 4 0 0 4096 Jun 1 16:55 .. root@ubuntu:/home/ubuntu# ls -lan /mnt/ceph_idmapped total 4 drwxrwxrwx 2 0 0 0 Jun 1 17:51 . drwxr-xr-x 4 0 0 4096 Jun 1 16:55 .. 5. Now let's create a bunch of files with different owners and through different mounts (idmapped/non-idmapped). 5.1. Create a file from 0:0 through the idmapped mount (it should appear as 1000:1000 on disk) root@ubuntu:/home/ubuntu# sudo -u#0 -g#0 touch /mnt/ceph_idmapped/created_through_idmapped_mnt_with_uid0 5.2. Create a file from 1000:1000 through the idmapped mount (should fail because 1000:1000 is not a valid UID/GID as it can't be mapped back to the "on-disk" UID/GID set). root@ubuntu:/home/ubuntu# sudo -u#1000 -g#1000 touch /mnt/ceph_idmapped/created_through_idmapped_mnt_with_uid1000 touch: cannot touch '/mnt/ceph_idmapped/created_through_idmapped_mnt_with_uid1000': Value too large for defined data type ... and we've got EOVERFLOW. That's correct! 5.3. Create a file from 0:0 but through the regular mount. (it should appear as overflowuid(=65534) in idmapped mount, because 0:0 on-disk is not mapped to the UID/GID set). root@ubuntu:/home/ubuntu# sudo -u#0 -g#0 touch /mnt/ceph/created_directly_with_uid0 5.4. Create a file from 1000:1000 but through the regular mount. (it should appear as 0:0 in idmapped mount, because 1000 (on-disk) mapped to 0). root@ubuntu:/home/ubuntu# sudo -u#1000 -g#1000 touch /mnt/ceph/created_directly_with_uid1000 6. Now let's look on the result: root@ubuntu:/home/ubuntu# ls -lan /mnt/ceph total 4 drwxrwxrwx 2 1000 1000 3 Jun 1 17:54 . drwxr-xr-x 4 0 0 4096 Jun 1 16:55 .. -rw-r--r-- 1 0 0 0 Jun 1 17:54 created_directly_with_uid0 -rw-rw-r-- 1 1000 1000 0 Jun 1 17:54 created_directly_with_uid1000 -rw-r--r-- 1 1000 1000 0 Jun 1 17:53 created_through_idmapped_mnt_with_uid0 root@ubuntu:/home/ubuntu# ls -lan /mnt/ceph_idmapped total 4 drwxrwxrwx 2 0 0 3 Jun 1 17:54 . drwxr-xr-x 4 0 0 4096 Jun 1 16:55 .. -rw-r--r-- 1 65534 65534 0 Jun 1 17:54 created_directly_with_uid0 -rw-rw-r-- 1 0 0 0 Jun 1 17:54 created_directly_with_uid1000 -rw-r--r-- 1 0 0 0 Jun 1 17:53 created_through_idmapped_mnt_with_uid0
[1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
[2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
[3] https://lore.kernel.org/lkml/CAEivzxcBBJV6DOGzy5S7=TUjrXZfVaGaJX5z7WFzYq1w4MdtiA@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CAEivzxefBRPozUPQxYgVh0gOpjsovtBuJ3w9BoqSizpST_YxTA@mail.gmail.com/#t
Updated by Aleksandr Mikhalitsyn 10 months ago
Stéphane Graber pointed [2] that there are users who want to use cephfs idmapped mounts
with MDS versions which don't support CEPHFS_FEATURE_HAS_OWNER_UIDGID.
For this workloads we can go in a way that was proposed originally [1] idmap caller_{u,g}id
fields for CEPH_MDS_OP_CREATE, CEPH_MDS_OP_MKNOD, CEPH_MDS_OP_MKDIR, CEPH_MDS_OP_SYMLINK.
This will break UID/GID-based restrictions on the MDS side, but for container workloads
this security feature is not widely used.
So, to summarize:
- for MDS that supports CEPHFS_FEATURE_HAS_OWNER_UIDGID
we never touch caller_{u,g}id fields with idmapping, but only idmap owner_{u,g}id
- for MDS that does not support CEPHFS_FEATURE_HAS_OWNER_UIDGID IF specific cephfs module parameter (like "idmap_with_old_mds") passed
we idmap caller_{u,g}id fields
- for MDS that does not support CEPHFS_FEATURE_HAS_OWNER_UIDGID IF "idmap_with_old_mds" module parameter IS NOT passed
fail CREATE/SYMLINK/MKDIR/CREATE requests with -EIO for requests that comes through an idmapped mount
[1] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
[2] https://lore.kernel.org/all/CA+enf=sFC-hiziuXoeDWnb7MoErc+b1PAncOjbM2rNyB4fzfwA@mail.gmail.com/#t
Updated by Venky Shankar 9 months ago
- Assignee set to Aleksandr Mikhalitsyn
- Target version set to v19.0.0
- Backport set to reef,quincy,pacific
Updated by Aleksandr Mikhalitsyn 9 months ago
Xiubo proposed [1] to add one extra field to the struct ceph_mds_request_head.
This field will be filled by client with sizeof(struct ceph_mds_request_head) and
allows MDS to properly skip fields in struct ceph_mds_request_head
which are not supported. (When client is newer than MDS.)
This approach makes the process of struct ceph_mds_request_head extension less painful for us in the future.
[1] https://lore.kernel.org/all/41ba61bf-3a45-cb20-1e4c-38dbd65bafa6@redhat.com/
Updated by Venky Shankar 9 months ago
- Category set to Security Model
- Status changed from New to Pending Backport
Updated by Backport Bot 9 months ago
- Copied to Backport #62569: pacific: ceph_fs.h: add separate owner_{u,g}id fields added
Updated by Backport Bot 9 months ago
- Copied to Backport #62570: reef: ceph_fs.h: add separate owner_{u,g}id fields added
Updated by Backport Bot 9 months ago
- Copied to Backport #62571: quincy: ceph_fs.h: add separate owner_{u,g}id fields added
Updated by Konstantin Shalygin 7 months ago
- Tracker changed from Feature to Bug
- Status changed from Pending Backport to Resolved
- % Done changed from 90 to 100
- Source set to Community (user)
- Regression set to No
- Severity set to 3 - minor
Updated by Patrick Donnelly 7 months ago
- Related to Bug #63288: MClientRequest: properly handle ceph_mds_request_head_legacy for ext_num_retry, ext_num_fwd, owner_uid, owner_gid added