Project

General

Profile

Actions

Bug #63461

open

Long delays when two threads modify the same directory

Added by Xavi Hernandez 6 months ago. Updated 4 months ago.

Status:
Triaged
Priority:
Normal
Assignee:
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've identified an issue in a CephFS' kernel mount while accessing it from Samba.

The workload is just to create and delete a file in the root directory of the mount from two or more clients (each client uses a different file name).

What I've observed is that some of the operations (create or delete) take near 5 seconds, and all clients tend to complete the pending operation at the same time, so almost all operations complete in batches every 5 seconds, independently of when they were started.

After analyzing what samba does, I've been able to create a reproducer that doesn't depend on Samba. This is a subset of operations that causes the issue:

    dirfd = openat(AT_FDCWD, mount_path, O_RDONLY | O_PATH);
    fstatat(dirfd, "", &st, AT_EMPTY_PATH);
    fd = openat(dirfd, file_name, O_CREAT | O_TRUNC | O_RDWR, 0644);
    fstatat(dirfd, "", &st, AT_EMPTY_PATH);
    unlinkat(dirfd, file_name, 0);
    close(fd);
    close(dirfd);

If this code is run in a loop from two threads accessing the same CephFS mount point, then several "fstatat" calls take near 5 seconds to complete.

Actions #1

Updated by Xavi Hernandez 6 months ago

I've just seen that the delay corresponds roughly to the value of mds_tick_interval option. Changing this value also changes the delays seen during the test to a very near value.

Actions #2

Updated by Venky Shankar 6 months ago

Xavi Hernandez wrote:

I've just seen that the delay corresponds roughly to the value of mds_tick_interval option. Changing this value also changes the delays seen during the test to a very near value.

Mostly, any operation that gets kicked at ~5s (tick interval) is related to flushing of mdlog. The MDS however can flush the mdlog if it finds it necessary to satisfy a client request. There have been a couple of fixes in the past related to this.

Which ceph version are you using and what's max_mds set to?

Actions #3

Updated by Xavi Hernandez 6 months ago

Venky Shankar wrote:

Which ceph version are you using and what's max_mds set to?

I'm using a recent build from main branch (commit 8858839c) on CentOS 9 Stream.

max_mds is 1.

The test I described is the only thing accessing the CephFS volume.

Actions #4

Updated by Venky Shankar 6 months ago

Xavi Hernandez wrote:

Venky Shankar wrote:

Which ceph version are you using and what's max_mds set to?

I'm using a recent build from main branch (commit 8858839c) on CentOS 9 Stream.

max_mds is 1.

The test I described is the only thing accessing the CephFS volume.

Thanks, Xavi. I'll recreate in my test cluster and see what's going on.

Actions #5

Updated by Venky Shankar 4 months ago

  • Status changed from New to Triaged
  • Assignee set to Venky Shankar
  • Target version set to v19.0.0
Actions

Also available in: Atom PDF