Bug #54013
centos stream 8 kernel 358: async dirops causes Cannot write: Operation not permitted
0%
Description
After upgrading to the Stream 8 kernel 4.18.0-358.el8.x86_64 (which has 'ceph: enable async dirops by default (Jeffrey Layton) [2017796]" in the changelog), a user is reporting errors like:
$ vi /mnt/opencast_share/opencast_data/oc_opencast/new.txt "oc_opencast/new.txt" "oc_opencast/new.txt" E514: write error (file system full?)
I can reproduce by untaring the kernel. Download the kernel works -- but untar fails:
[root@ocweb-test opencast_share]# pwd /mnt/opencast_share [root@ocweb-test opencast_share]# cat /proc/mounts | grep opencast_share a.b.c.d:6790,a.b.c.d:6790,a.b.c.d:6790:/volumes/_nogroup/xxxx-xxx-xxx-xxx-xxxx /mnt/opencast_share ceph rw,noatime,name=opencast_test_01_rw,secret=<hidden>,acl 0 0 [root@ocweb-test opencast_share]# curl -OL https://git.kernel.org/torvalds/t/linux-5.17-rc1.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 162 100 162 0 0 1384 0 --:--:-- --:--:-- --:--:-- 1384 100 192M 0 192M 0 0 137M 0 --:--:-- 0:00:01 --:--:-- 154M [root@ocweb-test opencast_share]# [root@ocweb-test opencast_share]# [root@ocweb-test opencast_share]# ls -l total 197052 drwx------ 3 root root 9 Jan 25 16:19 linux-5.17-rc1 -rw-r--r-- 1 root root 201780465 Jan 25 16:18 linux-5.17-rc1.tar.gz drwxr-xr-x 6 opencast opencast 9 Jan 25 15:53 xx drwxrws--- 2 opencast opencast 38 Jan 25 15:41 yy drwxr-xr-x 3 opencast opencast 1 Jan 25 15:41 zz [root@ocweb-test opencast_share]# tar xf linux-5.17-rc1.tar.gz tar: linux-5.17-rc1/.get_maintainer.ignore: Cannot write: Operation not permitted tar: linux-5.17-rc1/.gitattributes: Cannot write: Operation not permitted tar: linux-5.17-rc1/.gitignore: Cannot write: Operation not permitted tar: linux-5.17-rc1/.mailmap: Cannot write: Operation not permitted tar: linux-5.17-rc1/COPYING: Cannot write: Operation not permitted tar: linux-5.17-rc1/CREDITS: Cannot write: Operation not permitted tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-bus-iio: Cannot write: Operation not permitted tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-bus-usb: Cannot write: Operation not permitted tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-class-typec: Cannot write: Operation not permitted tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-cpuidle: Cannot write: Operation not permitted tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-driver-hid-roccat-arvo: Cannot write: Operation not permitted tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-driver-hid-roccat-isku: Cannot write: Operation not permitted [root@ocweb-test opencast_share]# ls -lR linux-5.17-rc1 | head linux-5.17-rc1: total 0 -rw-rw-r-- 1 root root 0 Jan 23 09:12 COPYING -rw-rw-r-- 1 root root 0 Jan 23 09:12 CREDITS drwx------ 18 root root 28 Jan 25 16:19 Documentation linux-5.17-rc1/Documentation: total 1 drwxrwxr-x 6 root root 5 Jan 23 09:12 ABI -rw-rw-r-- 1 root root 0 Jan 23 09:12 COPYING-logo [root@ocweb-test opencast_share]# cat linux-5.17-rc1/COPYING [root@ocweb-test opencast_share]#
There are no errors in dmesg. selinux is disabled.
Here is the entry in /etc/fstab:
redacted.cern.ch:6790:/volumes/_nogroup/redacted /mnt/opencast_share ceph name=opencast_test_01_rw,secretfile=/etc/ceph/dwight.opencast_test_01_rw.secret,x-systemd.device-timeout=30,x-systemd.mount-timeout=30,noatime,_netdev,rw 0 2
The cluster is running octopus v15.2.15.
The cephx client has these caps:
[client.xxx] key = redacted== caps mds = "allow rw path=/volumes/_nogroup/<uuid>" caps mon = "allow r" caps osd = "allow rw pool=cephfs_data namespace=fsvolumens_<uuid>"
Remounting with `-owsync` fixes the issue.
Any advice debugging this?
History
#1 Updated by Dan van der Ster about 1 year ago
I believe the issue is related to the mds path restriction or osd namespace restriction. (Both created by the fs volumes driver, used by manila).
If I mount the same cluster with kernel 358 using the admin keyring and root path /, I don't have any problems.
#2 Updated by Jeff Layton about 1 year ago
- Status changed from New to In Progress
- Assignee set to Jeff Layton
Thanks for the bug report. Do you know whether this also happens with more recent mainline kernels, or is the problem only seen with centos8 stream kernels? In the meantime, I'll see if I can set up a reproducer using the path restricted caps.
#3 Updated by Dan van der Ster about 1 year ago
Jeff Layton wrote:
Thanks for the bug report. Do you know whether this also happens with more recent mainline kernels, or is the problem only seen with centos8 stream kernels? In the meantime, I'll see if I can set up a reproducer using the path restricted caps.
Thanks for the quick reply!
5.16.2 has the same issue:
# uname -a Linux xx.cern.ch 5.16.2-1.el8.elrepo.x86_64 #1 SMP PREEMPT Tue Jan 18 15:26:58 EST 2022 x86_64 x86_64 x86_64 GNU/Linux # mount redacted.cern.ch:6789:/volumes/_nogroup/xxx -t ceph -oname=cephprojectspace,secret=xxxx /mnt/ # cd /mnt/test/ # ls -l total 197056 drwx------. 3 root root 9 Jan 25 18:52 linux-5.17-rc1 -rw-r--r--. 1 root root 201780465 Jan 25 16:41 linux-5.17-rc1.tar.gz -rw-r--r--. 1 root root 3795 Jul 3 2019 out.dat.bak -rwxr-xr-x. 1 root root 105 Jul 3 2019 test.py # rm -rf linux-5.17-rc1 # tar xf linux-5.17-rc1.tar.gz 2>&1 | head tar: linux-5.17-rc1/.get_maintainer.ignore: Cannot write: Operation not permitted tar: linux-5.17-rc1/.gitattributes: Cannot write: Operation not permitted tar: linux-5.17-rc1/.gitignore: Cannot write: Operation not permitted tar: linux-5.17-rc1/.mailmap: Cannot write: Operation not permitted tar: linux-5.17-rc1/COPYING: Cannot write: Operation not permitted tar: linux-5.17-rc1/CREDITS: Cannot write: Operation not permitted tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-bus-iio: Cannot write: Operation not permitted tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-bus-usb: Cannot write: Operation not permitted tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-class-typec: Cannot write: Operation not permitted tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-cpuidle: Cannot write: Operation not permitted
Remounting with wsync:
# umount /mnt/ # mount redacted.cern.ch:6789:/volumes/_nogroup/xxx -t ceph -oname=cephprojectspace,secret=xxx=,wsync /mnt/ [root@cephfs-testcs8ml-bcdc59edd4 ~]# cd /mnt/test/ [root@cephfs-testcs8ml-bcdc59edd4 test]# ls linux-5.17-rc1 linux-5.17-rc1.tar.gz out.dat.bak test.py [root@cephfs-testcs8ml-bcdc59edd4 test]# rm -rf linux-5.17-rc1 [root@cephfs-testcs8ml-bcdc59edd4 test]# tar xvf linux-5.17-rc1.tar.gz 2>&1 | head linux-5.17-rc1/ linux-5.17-rc1/.clang-format linux-5.17-rc1/.cocciconfig linux-5.17-rc1/.get_maintainer.ignore linux-5.17-rc1/.gitattributes linux-5.17-rc1/.gitignore linux-5.17-rc1/.mailmap linux-5.17-rc1/COPYING linux-5.17-rc1/CREDITS linux-5.17-rc1/Documentation/ ...
The cap for above is:
[client.cephprojectspace] key = xx== caps mds = "allow rw path=/volumes/_nogroup/xxx" caps mon = "allow r" caps osd = "allow rw pool=cephfs_data namespace=fsvolumens_xxx"
#4 Updated by Jeff Layton about 1 year ago
Thanks. I was able to reproduce this too with the restricted caps. Here's what we see at the syscall level:
1834 openat(AT_FDCWD, "linux-5.17-rc1/.gitattributes", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0664) = 4 1834 write(4, "*.c diff=cpp\n*.h diff=cpp\n*."..., 62) = -1 EPERM (Operation not permitted)
The problem is the call in ceph_write_iter to ceph_pool_perm_check which returns -EPERM in some cases.
I haven't tracked it down fully yet, but I suspect the problem is that we're not getting the inherited layout correct on an async create. Still trying to confirm that however.
#5 Updated by Jeff Layton about 1 year ago
Ok, the problem is that we weren't filling out the pool_ns in the inode info for new inodes. Patch posted to the ceph-devel ml:
https://lore.kernel.org/ceph-devel/20220125211022.114286-1-jlayton@kernel.org/T/#u
#6 Updated by Dan van der Ster about 1 year ago
Filed a Stream 8 bug: https://bugzilla.redhat.com/show_bug.cgi?id=2046021
#7 Updated by Jeff Layton about 1 year ago
Thanks. We did a bit of investigation into why our QA didn't catch this. Ceph has a bajillion different options and config knobs, and we simply can't run every possible test with every possible permutation. We currently rely on random selections for certain options (like wsync/nowsync).
As far as we can tell, the specific test that tests --namespace-isolated subvolumes just never got run with async dirops, due to pure dumb luck. We're planning to discuss how we can ID these sorts of coverage gaps and improve this, but it may be tough to improve that given the limits to testing infrastructure that we have.
#8 Updated by Dan van der Ster 12 months ago
ftr, i re-tested with kernel 5.16.7 and it looks fixed. (Fix was in 5.16.5). Thanks Jeff!
#9 Updated by Jeff Layton 8 months ago
- Status changed from In Progress to Resolved