Bug #62435
openPod unable to mount fscrypt encrypted cephfs PVC when it moves to another node
0%
Description
Here is our setup:
Kubernetes: 1.27.3
rook: 1.11.9
ceph: 17.2.6
OS: Ubuntu 20.04 modified kernel to support fscrypt
(Linux wkhd 6.3.0-rc4+ #6 SMP PREEMPT_DYNAMIC Mon May 22 22:48:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux)
1. Made change to operator.yaml & common.yaml
2. Enabled fscrypt on CephFileSystem
3. Enabled fscrypt on storageclass
as suggested in https://rook.io/docs/rook/v1.11/Storage-Configuration/Ceph-CSI/ceph-csi-drivers/#enable-rbd-encryption-support
After that, we are able to create the pod and mount the volume. But once the pod was deleted and recreated in a different node, we get:
Warning FailedMount 27s (x7 over 2m39s) kubelet MountVolume.MountDevice failed for volume "pvc-71b1ef3f-4b06-4809-a2ac-2c42c4477db4" : rpc error: code = Internal desc = fscrypt: unsupported state metadata=true kernel_policy=false
Files
Updated by Xiubo Li 9 months ago
- Status changed from Triaged to Need More Info
Hi Sudhin,
This is not cephfs fscrypt. You are encrypting from the disk layer, not the filesystem layer. My understanding is you just mount a encrypted RBD block device. We haven't support the cephfs encryption yet.
BTW, do you have any dmesg logs from the new node ?
Updated by Sudhin Bengeri 8 months ago
- File dmesg-230830.txt dmesg-230830.txt added
Hi Xiubo,
Thanks for your response.
Are you saying that cephfs does not support fscrypt? I am not exactly sure where we read it but remember coming across some docs which mentioned fscrypt being supported for cephfs as well, for instance:
https://github.com/ceph/ceph-csi/blob/devel/docs/design/proposals/cephfs-fscrypt.md
https://blog.rook.io/rook-v1-11-storage-enhancements-8001aa67e10e
Here are the dmesg logs we see when the error is reported (from the node on which the pod gets scheduled and fails to mount PVC) :
[Wed Aug 30 13:38:23 2023] libceph: mon0 (1)[xxxx:xxxx:xxxx:xxx1::94]:6789 session established
[Wed Aug 30 13:38:23 2023] libceph: client82004833 fsid 226694be-9975-4db9-8618-329b2a0633e1
[Wed Aug 30 13:38:25 2023] ceph: handle_cap_grant: cap grant attempt to change fscrypt_auth on non-I_NEW inode (old len 0 new len 48)
I have attached the logs around that time in case needed.
Please let me know if I can be of any further help.
Regards,
Sudhin
Updated by Xiubo Li 8 months ago
Sudhin Bengeri wrote:
Hi Xiubo,
Thanks for your response.
Are you saying that cephfs does not support fscrypt? I am not exactly sure where we read it but remember coming across some docs which mentioned fscrypt being supported for cephfs as well, for instance:
https://github.com/ceph/ceph-csi/blob/devel/docs/design/proposals/cephfs-fscrypt.md
https://blog.rook.io/rook-v1-11-storage-enhancements-8001aa67e10e
Hi Sudhin
Please see https://tracker.ceph.com/issues/46690, it's still under reviewing, which means not get merged to Linux mainline yet.
We are planing to merge it in 6.5 and already in the `master` branch now in ceph/ceph-client repo.
Here are the dmesg logs we see when the error is reported (from the node on which the pod gets scheduled and fails to mount PVC) :
[Wed Aug 30 13:38:23 2023] libceph: mon0 (1)[xxxx:xxxx:xxxx:xxx1::94]:6789 session established
[Wed Aug 30 13:38:23 2023] libceph: client82004833 fsid 226694be-9975-4db9-8618-329b2a0633e1
[Wed Aug 30 13:38:25 2023] ceph: handle_cap_grant: cap grant attempt to change fscrypt_auth on non-I_NEW inode (old len 0 new len 48)
Which kernel were you using ? It's so strange you can see the above logs, which was introduce by:
commit 2f22d41a33cf8d595a69272acbe56b9fbb454b72 Author: Jeff Layton <jlayton@kernel.org> Date: Thu Aug 25 09:31:09 2022 -0400 ceph: handle fscrypt fields in cap messages from MDS Handle the new fscrypt_file and fscrypt_auth fields in cap messages. Use them to populate new fields in cap_extra_info and update the inode with those values. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Are you using the ceph/ceph-client testing branch ? How could your kernel could include this patch ?
And also your ceph version is old too, in upstream we support fscrypt from v18.0.0
Thanks
Updated by Sudhin Bengeri 8 months ago
Hi Xuibo,
Here is the uname -a output from the nodes:
Linux wkhd 6.3.0-rc4+ #6 SMP PREEMPT_DYNAMIC Mon May 22 22:48:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Kernel built with this patch
https://patchwork.kernel.org/project/ceph-devel/cover/20230417032654.32352-1-xiubli@redhat.com
(which includes Jeff Layton's work).
We recently upgraded to ceph:v18.2.0 and rook-ceph:v1.12.1. Did any log in the dmesg output shared yesterday made you think that we were using old ceph version?
Thanks.
Sudhin
Updated by Xiubo Li 8 months ago
Sudhin Bengeri wrote:
Hi Xuibo,
Here is the uname -a output from the nodes:
Linux wkhd 6.3.0-rc4+ #6 SMP PREEMPT_DYNAMIC Mon May 22 22:48:41 UTC 2023 x86_64 x86_64 x86_64 GNU/LinuxKernel built with this patch
https://patchwork.kernel.org/project/ceph-devel/cover/20230417032654.32352-1-xiubli@redhat.com
(which includes Jeff Layton's work).We recently upgraded to ceph:v18.2.0 and rook-ceph:v1.12.1. Did any log in the dmesg output shared yesterday made you think that we were using old ceph version?
Hi Sudhin
Okay, you just patched the patches from mail list. The v19 is out of date, the latest version is v20. But it didn't change much and that shouldn't matter here IMO.
Could you try to reproduce it again by enabling:
1, kernel ceph.ko module's dynamic debug logs
2, enable debug_mds = 25 and debug_ms = 1 for the MDSs daemons
And then provide the above logs.
Thanks
Updated by Sudhin Bengeri 7 months ago
Hi Xiubo,
We built version v20 and installed on the ceph HD nodes, here is the updated uname -a from a node:
Linux wkhd 6.3.0-rc4-custom2 #7 SMP PREEMPT_DYNAMIC Fri Oct 13 00:55:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
We observed that this issue is resolved in this version - that is when a pod moved on another node it was able to mount fscrypt encrypted cephfs PVC. We have collected ceph.ko module's dynamic debugs logs and mds debug logs, but since it worked do you still need the logs.
Please let us know when this kernel is going to be GA.
Thanks.
Updated by Xiubo Li 7 months ago
Sudhin Bengeri wrote:
Hi Xiubo,
We built version v20 and installed on the ceph HD nodes, here is the updated uname -a from a node:
Linux wkhd 6.3.0-rc4-custom2 #7 SMP PREEMPT_DYNAMIC Fri Oct 13 00:55:23 UTC 2023 x86_64 x86_64 x86_64 GNU/LinuxWe observed that this issue is resolved in this version - that is when a pod moved on another node it was able to mount fscrypt encrypted cephfs PVC. We have collected ceph.ko module's dynamic debugs logs and mds debug logs, but since it worked do you still need the logs.
Okay, this logs make no sense to me. If you can reproduce it again please provide the logs.
Thanks
Please let us know when this kernel is going to be GA.
Thanks.
Updated by Sudhin Bengeri 3 months ago
Hi Xuibo,
We tried with the following updated setup:
Kubernetes: 1.28.4
rook: 1.13.2
ceph: 18.2.1
OS: Ubuntu 20.04 with latest 6.8rc1 kernel
Linux wkhd2 6.8.0-060800rc1-generic #202401212233 SMP PREEMPT_DYNAMIC Sun Jan 21 22:43:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
With this version we see that pods are able to mount fscrypt encrypted cephfs PVCs.
However, we see the following errors when a fscrypt encrypted cephfs PVC is mounted in multiple pods and these pods get scheduled to different nodes - the PVC does not get mounted in either pod, and pods do not come up.
Warning FailedMount 20m kubelet MountVolume.MountDevice failed for volume "pvc-e1c3d29f-6ec0-4a34-aefc-74420b7c09d1" : rpc error: code = Internal desc = "/etc/fscrypt.conf" is invalid: proto: syntax error (line 1:1): unexpected token
Warning FailedMount 116s (x11 over 19m) kubelet MountVolume.MountDevice failed for volume "pvc-e1c3d29f-6ec0-4a34-aefc-74420b7c09d1" : rpc error: code = Internal desc = fscrypt: unsupported state metadata=true kernel_policy=false"
Further, we have observed that the ceph storage nodes crash frequently.
Please let us know if you need any debug info from the nodes.
Thanks.
Sudhin