Project

General

Profile

Actions

Bug #44744

closed

Slow file creation/sync on kernel cephfs

Added by Jan Fajerski about 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

We're seeing weird behaviour of the sync tool with a kernel cephfs mount. To trigger this its enough to run the following snippet on a cephfs mount, this writes a 1MB file and calls sync immediately after in a loop (works without the loop to but the runtimes are much shorter):

time for i in $(seq -w 11 30); do xfs_io -f -c "pwrite 0 1m" $i.txt ; sync; done

This takes 90 seconds or longer in various test setups. On a ceph-fuse mount this takes more like 1.5 seconds.

Things get stranger when the file size changes or the sync tool is called with arguments:
- The kernel mount runtime goes down to fuse level when sync is called with the filename like so:

time for i in $(seq -w 11 30); do xfs_io -f -c "pwrite 0 1m" $i.txt ; sync $i.txt; done

- The kernel mount runtime goes down to fuse level when a 4MB file is written:

time for i in $(seq -w 11 30); do xfs_io -f -c "pwrite 0 4m" $i.txt ; sync; done

I also noticed that that the 4m write is only fast if the argument to xfs_io is the file name only.
Calling this with any path component (either absolute or relative) at all triggers the slow behaviour again.

Luis looked at what messages are exchanged:

when doing a full sync (i.e. when ->sync_fs() is executed), the client will wait for the MDS to send the CEPH_CAP_OP_FLUSH_ACK, acknowledging that everything is safe (in the journal, I believe). And this message is handled in Locker::handle_client_caps() (called from MDSRank::handle_deferrable_message), where it is queued for processing. Finally, it is later sent out to the client from Locker::file_update_finish(). And the time between being queued and sent corresponds to the delay experienced in the kernel client.

The fuse client does not wait for this FLUSH_ACK message from the MDS when 'sync' is executed. In fact, it even allows the filesystem to be umounted before receiving this ACK.

All this was tested with a recent nautilus and a kernel with backports roughly corresponding to a 5.0 kernel iirc. Numbers for a master build and mainline kernel are coming up.


Files


Related issues 2 (0 open2 closed)

Has duplicate Linux kernel client - Bug #45153: fsync locking up in certain conditionsDuplicateJeff Layton

Actions
Copied to CephFS - Bug #44850: sync on libcephfs and wait for CEPH_CAP_OP_FLUSH_ACKRejected

Actions
Actions

Also available in: Atom PDF