Bug #53809: cephfs: fsync on small directory takes multiple seconds - Linux kernel client - Ceph

Actions

Copy link

Bug #53809

closed

cephfs: fsync on small directory takes multiple seconds

Added by Niklas Hambuechen over 2 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Xiubo Li

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v16.2.7

ceph-qa-suite:

kcephfs

Crash signature (v1):

Crash signature (v2):

Description

Our application needs to ensure durability of written files before acknowledging that writes were done, ensuring that the file cannot be lost.

Thus, it performs `fsync()` on the written file (to make file contents durable), and `fsync()` on the directory containing the file (to make the dirent durable). (The need for that on local file systems is explained e.g. in this Ceph talk, and I believe it is necessary for Ceph as well.)

Writing a small benchmark, I noticed that on CephFS, `fsync` on a directory is extremely, unreasonably slow for this pattern:

# mkdir /mycephfs/niklas-test
# cd /mycephfs/niklas-test

# strace -fy -e fsync -T sh -c 'for i in {1..10}; do touch new-"$i"; sync .; done'
fsync(3</mycephfs/niklas-test>) = 0 <0.635923>
fsync(3</mycephfs/niklas-test>) = 0 <3.557392>
fsync(3</mycephfs/niklas-test>) = 0 <0.001497>
fsync(3</mycephfs/niklas-test>) = 0 <3.085821>
fsync(3</mycephfs/niklas-test>) = 0 <1.879268>
fsync(3</mycephfs/niklas-test>) = 0 <4.998007>
fsync(3</mycephfs/niklas-test>) = 0 <0.004683>
fsync(3</mycephfs/niklas-test>) = 0 <4.975168>
fsync(3</mycephfs/niklas-test>) = 0 <5.029351>
fsync(3</mycephfs/niklas-test>) = 0 <0.001417>

The directory fsyncs take up to 5 seconds!

This is unreasonable to me because I cannot come up with any operation on this < 10 file directory that could possibly take 5 seconds.

This CephFS is backed by 3 idle nodes with 10G networking, 0.3ms ping, metadata pool backed by enterprise NVMe SSDs, and data pool backed by spinning disks.

Version is 16.2.7, Linux 5.10.81 kernel mount.

I've tried with/without `client cache size = 0` (same results), and the same issue happens on a single-node CephFS deployment, which should excludes the possibility that the network has anything to do with it.

While the above loop is running, Ceph reports that almost no IO is going on:

# ceph osd pool stats
pool device_health_metrics id 1
  nothing is going on

pool mycephfs_data id 2
  nothing is going on

pool mycephfs_metadata id 3
  client io 2.3 KiB/s wr, 0 op/s rd, 1 op/s wr

Files

node-6-kernel-5.15-crash-dump-screenshot.png (603 KB) node-6-kernel-5.15-crash-dump-screenshot.png

Niklas Hambuechen, 01/10/2022 05:03 PM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #53809

cephfs: fsync on small directory takes multiple seconds

Updated by Niklas Hambuechen over 2 years ago

Updated by Niklas Hambuechen over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Niklas Hambuechen over 2 years ago

Updated by Niklas Hambuechen over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Niklas Hambuechen over 2 years ago

Updated by Niklas Hambuechen over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Xiubo Li about 2 years ago