Project

General

Profile

Actions

Bug #53809

closed

cephfs: fsync on small directory takes multiple seconds

Added by Niklas Hambuechen over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
kcephfs
Crash signature (v1):
Crash signature (v2):

Description

Our application needs to ensure durability of written files before acknowledging that writes were done, ensuring that the file cannot be lost.

Thus, it performs `fsync()` on the written file (to make file contents durable), and `fsync()` on the directory containing the file (to make the dirent durable). (The need for that on local file systems is explained e.g. in this Ceph talk, and I believe it is necessary for Ceph as well.)

Writing a small benchmark, I noticed that on CephFS, `fsync` on a directory is extremely, unreasonably slow for this pattern:

# mkdir /mycephfs/niklas-test
# cd /mycephfs/niklas-test

# strace -fy -e fsync -T sh -c 'for i in {1..10}; do touch new-"$i"; sync .; done'
fsync(3</mycephfs/niklas-test>) = 0 <0.635923>
fsync(3</mycephfs/niklas-test>) = 0 <3.557392>
fsync(3</mycephfs/niklas-test>) = 0 <0.001497>
fsync(3</mycephfs/niklas-test>) = 0 <3.085821>
fsync(3</mycephfs/niklas-test>) = 0 <1.879268>
fsync(3</mycephfs/niklas-test>) = 0 <4.998007>
fsync(3</mycephfs/niklas-test>) = 0 <0.004683>
fsync(3</mycephfs/niklas-test>) = 0 <4.975168>
fsync(3</mycephfs/niklas-test>) = 0 <5.029351>
fsync(3</mycephfs/niklas-test>) = 0 <0.001417>

The directory fsyncs take up to 5 seconds!

This is unreasonable to me because I cannot come up with any operation on this < 10 file directory that could possibly take 5 seconds.

This CephFS is backed by 3 idle nodes with 10G networking, 0.3ms ping, metadata pool backed by enterprise NVMe SSDs, and data pool backed by spinning disks.

Version is 16.2.7, Linux 5.10.81 kernel mount.

I've tried with/without `client cache size = 0` (same results), and the same issue happens on a single-node CephFS deployment, which should excludes the possibility that the network has anything to do with it.

While the above loop is running, Ceph reports that almost no IO is going on:

# ceph osd pool stats
pool device_health_metrics id 1
  nothing is going on

pool mycephfs_data id 2
  nothing is going on

pool mycephfs_metadata id 3
  client io 2.3 KiB/s wr, 0 op/s rd, 1 op/s wr

Files


Related issues 1 (0 open1 closed)

Related to Linux kernel client - Bug #55327: kclient: BUG: kernel NULL pointer dereference, address: 0000000000000008ResolvedXiubo Li

Actions
Actions

Also available in: Atom PDF