Bug #62664
openceph-fuse: failed to remount for kernel dentry trimming; quitting!
0%
Description
Hi,
While #62604 is being addressed I wanted to try the ceph-fuse client. I'm using the same setup with kernel 6.4.11 and ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable).
The ceph-fuse client fails to remount the FS to trim dentries:
hut# ceph-fuse -f -n client.user -m 10.0.40.40 /ceph2 2023-08-31T14:08:31.479+0200 7fcc01e9e3c0 -1 init, newargv = 0x55777ee84dc0 newargc=16 ceph-fuse[1342578]: starting ceph client ceph-fuse[1342578]: starting fuse mount: /ceph2: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. 2023-08-31T14:08:31.501+0200 7fcbd7fff6c0 -1 client.494146 failed to remount (to trim kernel dentries): return code = 32 2023-08-31T14:08:31.501+0200 7fcbd7fff6c0 -1 client.494146 failed to remount for kernel dentry trimming; quitting! mount: /ceph2: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. 2023-08-31T14:08:32.512+0200 7fcbd7fff6c0 -1 client.494146 failed to remount (to trim kernel dentries): return code = 32 2023-08-31T14:08:32.512+0200 7fcbd7fff6c0 -1 client.494146 failed to remount for kernel dentry trimming; quitting! mount: /ceph2: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. 2023-08-31T14:08:33.522+0200 7fcbd7fff6c0 -1 client.494146 failed to remount (to trim kernel dentries): return code = 32 2023-08-31T14:08:33.522+0200 7fcbd7fff6c0 -1 client.494146 failed to remount for kernel dentry trimming; quitting! mount: /ceph2: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. 2023-08-31T14:08:34.533+0200 7fcbd7fff6c0 -1 client.494146 failed to remount (to trim kernel dentries): return code = 32 2023-08-31T14:08:34.533+0200 7fcbd7fff6c0 -1 client.494146 failed to remount for kernel dentry trimming; quitting! mount: /ceph2: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. 2023-08-31T14:08:35.543+0200 7fcbd7fff6c0 -1 client.494146 failed to remount (to trim kernel dentries): return code = 32 2023-08-31T14:08:35.543+0200 7fcbd7fff6c0 -1 client.494146 failed to remount for kernel dentry trimming; quitting! ceph-fuse[1342578]: fuse failed dentry invalidate/remount test with error (32) Broken pipe, stopping /build/ceph-18.2.0/src/ceph_fuse.cc: In function 'virtual void* main(int, const char**, const char**)::RemountTest::entry()' thread 7fcbd7fff6c0 time 2023-08-31T14:08:36.554 371+0200 /build/ceph-18.2.0/src/ceph_fuse.cc: 243: ceph_abort_msg("abort() called") ...
I attach the full strace log, it seems to be failing in the mount command, in the fsconfig() call with the "allow_other" option:
1343504 access("/run/mount/utab", R_OK|W_OK) = 0 1343504 fspick(3, "", FSPICK_NO_AUTOMOUNT|FSPICK_EMPTY_PATH) = 4 1343504 fsconfig(4, FSCONFIG_SET_FLAG, "allow_other", NULL, 0) = -1 EINVAL (Invalid argument) 1343504 close(4) = 0 1343504 close(3) = 0
Here is my /etc/fuse.conf:
#user_allow_other mount_max = 1000
And the same thing happens if I enable user_allow_other.
Setting this option in /etc/ceph.conf:
[client] client_die_on_failed_dentry_invalidate=false
Allows me to mount the FS anyway and perform some I/O benchmarks with no apparent problems, but I don't know the implications of that option.
Files
Updated by Jakob Haufe 6 days ago
This is related to https://github.com/util-linux/util-linux/issues/2576 and will happen on any system with util-linux/libmount 2.38 or later and a kernel supporting the new fsconfig
, which is available since 5.1.
Current workaround is setting the LIBMOUNT_FORCE_MOUNT2
environment variable to always
.
This could be added to remount_cb
in src/client/fuse_ll.cc
.
Note that this (currently) affects all versions of ceph-fuse.
Updated by Jakob Haufe 5 days ago
I created a minimal PR implementing this: https://github.com/ceph/ceph/pull/57170
Is there any jenkins test to run this on a new enough base system to be affected by this issue?
Updated by Venky Shankar 4 days ago
- Category set to Correctness/Safety
- Status changed from New to Fix Under Review
- Target version set to v20.0.0
- Backport set to reef,squid
- Pull request ID set to 57170
- Component(FS) ceph-fuse added
Jakob Haufe wrote in #note-4:
I created a minimal PR implementing this: https://github.com/ceph/ceph/pull/57170
Is there any jenkins test to run this on a new enough base system to be affected by this issue?
Thanks for the PR. I'll have to check the systems we run our fs suites on w.r.t. the versions of the libs used.