Project

General

Profile

Bug #54013

centos stream 8 kernel 358: async dirops causes Cannot write: Operation not permitted

Added by Dan van der Ster about 1 year ago. Updated 8 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

After upgrading to the Stream 8 kernel 4.18.0-358.el8.x86_64 (which has 'ceph: enable async dirops by default (Jeffrey Layton) [2017796]" in the changelog), a user is reporting errors like:

$ vi /mnt/opencast_share/opencast_data/oc_opencast/new.txt
"oc_opencast/new.txt" 
"oc_opencast/new.txt" E514: write error (file system full?)

I can reproduce by untaring the kernel. Download the kernel works -- but untar fails:

[root@ocweb-test opencast_share]# pwd
/mnt/opencast_share

[root@ocweb-test opencast_share]# cat /proc/mounts  | grep opencast_share
a.b.c.d:6790,a.b.c.d:6790,a.b.c.d:6790:/volumes/_nogroup/xxxx-xxx-xxx-xxx-xxxx  /mnt/opencast_share ceph rw,noatime,name=opencast_test_01_rw,secret=<hidden>,acl 0 0

[root@ocweb-test opencast_share]# curl -OL https://git.kernel.org/torvalds/t/linux-5.17-rc1.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   162  100   162    0     0   1384      0 --:--:-- --:--:-- --:--:--  1384
100  192M    0  192M    0     0   137M      0 --:--:--  0:00:01 --:--:--  154M
[root@ocweb-test opencast_share]#
[root@ocweb-test opencast_share]#
[root@ocweb-test opencast_share]# ls -l
total 197052
drwx------  3 root     root             9 Jan 25 16:19 linux-5.17-rc1
-rw-r--r--  1 root     root     201780465 Jan 25 16:18 linux-5.17-rc1.tar.gz
drwxr-xr-x  6 opencast opencast         9 Jan 25 15:53 xx
drwxrws---  2 opencast opencast        38 Jan 25 15:41 yy
drwxr-xr-x  3 opencast opencast         1 Jan 25 15:41 zz

[root@ocweb-test opencast_share]# tar xf linux-5.17-rc1.tar.gz
tar: linux-5.17-rc1/.get_maintainer.ignore: Cannot write: Operation not permitted
tar: linux-5.17-rc1/.gitattributes: Cannot write: Operation not permitted
tar: linux-5.17-rc1/.gitignore: Cannot write: Operation not permitted
tar: linux-5.17-rc1/.mailmap: Cannot write: Operation not permitted
tar: linux-5.17-rc1/COPYING: Cannot write: Operation not permitted
tar: linux-5.17-rc1/CREDITS: Cannot write: Operation not permitted
tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-bus-iio: Cannot write: Operation not permitted
tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-bus-usb: Cannot write: Operation not permitted
tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-class-typec: Cannot write: Operation not permitted
tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-cpuidle: Cannot write: Operation not permitted
tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-driver-hid-roccat-arvo: Cannot write: Operation not permitted
tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-driver-hid-roccat-isku: Cannot write: Operation not permitted

[root@ocweb-test opencast_share]# ls -lR linux-5.17-rc1 | head
linux-5.17-rc1:
total 0
-rw-rw-r--  1 root root  0 Jan 23 09:12 COPYING
-rw-rw-r--  1 root root  0 Jan 23 09:12 CREDITS
drwx------ 18 root root 28 Jan 25 16:19 Documentation

linux-5.17-rc1/Documentation:
total 1
drwxrwxr-x  6 root root  5 Jan 23 09:12 ABI
-rw-rw-r--  1 root root  0 Jan 23 09:12 COPYING-logo

[root@ocweb-test opencast_share]# cat linux-5.17-rc1/COPYING
[root@ocweb-test opencast_share]#

There are no errors in dmesg. selinux is disabled.

Here is the entry in /etc/fstab:

redacted.cern.ch:6790:/volumes/_nogroup/redacted    /mnt/opencast_share        ceph    name=opencast_test_01_rw,secretfile=/etc/ceph/dwight.opencast_test_01_rw.secret,x-systemd.device-timeout=30,x-systemd.mount-timeout=30,noatime,_netdev,rw       0       2

The cluster is running octopus v15.2.15.

The cephx client has these caps:

[client.xxx]
        key = redacted==
        caps mds = "allow rw path=/volumes/_nogroup/<uuid>" 
        caps mon = "allow r" 
        caps osd = "allow rw pool=cephfs_data namespace=fsvolumens_<uuid>" 

Remounting with `-owsync` fixes the issue.

Any advice debugging this?

History

#1 Updated by Dan van der Ster about 1 year ago

I believe the issue is related to the mds path restriction or osd namespace restriction. (Both created by the fs volumes driver, used by manila).

If I mount the same cluster with kernel 358 using the admin keyring and root path /, I don't have any problems.

#2 Updated by Jeff Layton about 1 year ago

  • Status changed from New to In Progress
  • Assignee set to Jeff Layton

Thanks for the bug report. Do you know whether this also happens with more recent mainline kernels, or is the problem only seen with centos8 stream kernels? In the meantime, I'll see if I can set up a reproducer using the path restricted caps.

#3 Updated by Dan van der Ster about 1 year ago

Jeff Layton wrote:

Thanks for the bug report. Do you know whether this also happens with more recent mainline kernels, or is the problem only seen with centos8 stream kernels? In the meantime, I'll see if I can set up a reproducer using the path restricted caps.

Thanks for the quick reply!

5.16.2 has the same issue:

# uname -a
Linux xx.cern.ch 5.16.2-1.el8.elrepo.x86_64 #1 SMP PREEMPT Tue Jan 18 15:26:58 EST 2022 x86_64 x86_64 x86_64 GNU/Linux
# mount redacted.cern.ch:6789:/volumes/_nogroup/xxx -t ceph -oname=cephprojectspace,secret=xxxx /mnt/
# cd /mnt/test/
# ls -l
total 197056
drwx------. 3 root root         9 Jan 25 18:52 linux-5.17-rc1
-rw-r--r--. 1 root root 201780465 Jan 25 16:41 linux-5.17-rc1.tar.gz
-rw-r--r--. 1 root root      3795 Jul  3  2019 out.dat.bak
-rwxr-xr-x. 1 root root       105 Jul  3  2019 test.py
# rm -rf linux-5.17-rc1
# tar xf linux-5.17-rc1.tar.gz 2>&1 | head
tar: linux-5.17-rc1/.get_maintainer.ignore: Cannot write: Operation not permitted
tar: linux-5.17-rc1/.gitattributes: Cannot write: Operation not permitted
tar: linux-5.17-rc1/.gitignore: Cannot write: Operation not permitted
tar: linux-5.17-rc1/.mailmap: Cannot write: Operation not permitted
tar: linux-5.17-rc1/COPYING: Cannot write: Operation not permitted
tar: linux-5.17-rc1/CREDITS: Cannot write: Operation not permitted
tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-bus-iio: Cannot write: Operation not permitted
tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-bus-usb: Cannot write: Operation not permitted
tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-class-typec: Cannot write: Operation not permitted
tar: linux-5.17-rc1/Documentation/ABI/obsolete/sysfs-cpuidle: Cannot write: Operation not permitted

Remounting with wsync:

# umount /mnt/
# mount redacted.cern.ch:6789:/volumes/_nogroup/xxx -t ceph -oname=cephprojectspace,secret=xxx=,wsync /mnt/
[root@cephfs-testcs8ml-bcdc59edd4 ~]# cd /mnt/test/
[root@cephfs-testcs8ml-bcdc59edd4 test]# ls
linux-5.17-rc1  linux-5.17-rc1.tar.gz  out.dat.bak  test.py
[root@cephfs-testcs8ml-bcdc59edd4 test]# rm -rf linux-5.17-rc1
[root@cephfs-testcs8ml-bcdc59edd4 test]# tar xvf linux-5.17-rc1.tar.gz 2>&1 | head
linux-5.17-rc1/
linux-5.17-rc1/.clang-format
linux-5.17-rc1/.cocciconfig
linux-5.17-rc1/.get_maintainer.ignore
linux-5.17-rc1/.gitattributes
linux-5.17-rc1/.gitignore
linux-5.17-rc1/.mailmap
linux-5.17-rc1/COPYING
linux-5.17-rc1/CREDITS
linux-5.17-rc1/Documentation/
...

The cap for above is:

[client.cephprojectspace]
        key = xx==
        caps mds = "allow rw path=/volumes/_nogroup/xxx" 
        caps mon = "allow r" 
        caps osd = "allow rw pool=cephfs_data namespace=fsvolumens_xxx" 

#4 Updated by Jeff Layton about 1 year ago

Thanks. I was able to reproduce this too with the restricted caps. Here's what we see at the syscall level:

1834  openat(AT_FDCWD, "linux-5.17-rc1/.gitattributes", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0664) = 4
1834  write(4, "*.c   diff=cpp\n*.h   diff=cpp\n*."..., 62) = -1 EPERM (Operation not permitted)

The problem is the call in ceph_write_iter to ceph_pool_perm_check which returns -EPERM in some cases.

I haven't tracked it down fully yet, but I suspect the problem is that we're not getting the inherited layout correct on an async create. Still trying to confirm that however.

#5 Updated by Jeff Layton about 1 year ago

Ok, the problem is that we weren't filling out the pool_ns in the inode info for new inodes. Patch posted to the ceph-devel ml:

https://lore.kernel.org/ceph-devel/20220125211022.114286-1-jlayton@kernel.org/T/#u

#7 Updated by Jeff Layton about 1 year ago

Thanks. We did a bit of investigation into why our QA didn't catch this. Ceph has a bajillion different options and config knobs, and we simply can't run every possible test with every possible permutation. We currently rely on random selections for certain options (like wsync/nowsync).

As far as we can tell, the specific test that tests --namespace-isolated subvolumes just never got run with async dirops, due to pure dumb luck. We're planning to discuss how we can ID these sorts of coverage gaps and improve this, but it may be tough to improve that given the limits to testing infrastructure that we have.

#8 Updated by Dan van der Ster 12 months ago

ftr, i re-tested with kernel 5.16.7 and it looks fixed. (Fix was in 5.16.5). Thanks Jeff!

#9 Updated by Jeff Layton 8 months ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF