Bug #63461
openLong delays when two threads modify the same directory
0%
Description
I've identified an issue in a CephFS' kernel mount while accessing it from Samba.
The workload is just to create and delete a file in the root directory of the mount from two or more clients (each client uses a different file name).
What I've observed is that some of the operations (create or delete) take near 5 seconds, and all clients tend to complete the pending operation at the same time, so almost all operations complete in batches every 5 seconds, independently of when they were started.
After analyzing what samba does, I've been able to create a reproducer that doesn't depend on Samba. This is a subset of operations that causes the issue:
dirfd = openat(AT_FDCWD, mount_path, O_RDONLY | O_PATH);
fstatat(dirfd, "", &st, AT_EMPTY_PATH);
fd = openat(dirfd, file_name, O_CREAT | O_TRUNC | O_RDWR, 0644);
fstatat(dirfd, "", &st, AT_EMPTY_PATH);
unlinkat(dirfd, file_name, 0);
close(fd);
close(dirfd);
If this code is run in a loop from two threads accessing the same CephFS mount point, then several "fstatat" calls take near 5 seconds to complete.
Updated by Xavi Hernandez 6 months ago
I've just seen that the delay corresponds roughly to the value of mds_tick_interval option. Changing this value also changes the delays seen during the test to a very near value.
Updated by Venky Shankar 6 months ago
Xavi Hernandez wrote:
I've just seen that the delay corresponds roughly to the value of mds_tick_interval option. Changing this value also changes the delays seen during the test to a very near value.
Mostly, any operation that gets kicked at ~5s (tick interval) is related to flushing of mdlog. The MDS however can flush the mdlog if it finds it necessary to satisfy a client request. There have been a couple of fixes in the past related to this.
Which ceph version are you using and what's max_mds set to?
Updated by Xavi Hernandez 6 months ago
Venky Shankar wrote:
Which ceph version are you using and what's max_mds set to?
I'm using a recent build from main branch (commit 8858839c) on CentOS 9 Stream.
max_mds is 1.
The test I described is the only thing accessing the CephFS volume.
Updated by Venky Shankar 6 months ago
Xavi Hernandez wrote:
Venky Shankar wrote:
Which ceph version are you using and what's max_mds set to?
I'm using a recent build from main branch (commit 8858839c) on CentOS 9 Stream.
max_mds is 1.
The test I described is the only thing accessing the CephFS volume.
Thanks, Xavi. I'll recreate in my test cluster and see what's going on.
Updated by Venky Shankar 4 months ago
- Status changed from New to Triaged
- Assignee set to Venky Shankar
- Target version set to v19.0.0