Bug #44744
closedSlow file creation/sync on kernel cephfs
100%
Description
We're seeing weird behaviour of the sync tool with a kernel cephfs mount. To trigger this its enough to run the following snippet on a cephfs mount, this writes a 1MB file and calls sync immediately after in a loop (works without the loop to but the runtimes are much shorter):
time for i in $(seq -w 11 30); do xfs_io -f -c "pwrite 0 1m" $i.txt ; sync; done
This takes 90 seconds or longer in various test setups. On a ceph-fuse mount this takes more like 1.5 seconds.
Things get stranger when the file size changes or the sync tool is called with arguments:
- The kernel mount runtime goes down to fuse level when sync is called with the filename like so:
time for i in $(seq -w 11 30); do xfs_io -f -c "pwrite 0 1m" $i.txt ; sync $i.txt; done
- The kernel mount runtime goes down to fuse level when a 4MB file is written:
time for i in $(seq -w 11 30); do xfs_io -f -c "pwrite 0 4m" $i.txt ; sync; done
I also noticed that that the 4m write is only fast if the argument to xfs_io is the file name only.
Calling this with any path component (either absolute or relative) at all triggers the slow behaviour again.
Luis looked at what messages are exchanged:
when doing a full sync (i.e. when ->sync_fs() is executed), the client will wait for the MDS to send the CEPH_CAP_OP_FLUSH_ACK, acknowledging that everything is safe (in the journal, I believe). And this message is handled in Locker::handle_client_caps() (called from MDSRank::handle_deferrable_message), where it is queued for processing. Finally, it is later sent out to the client from Locker::file_update_finish(). And the time between being queued and sent corresponds to the delay experienced in the kernel client.
The fuse client does not wait for this FLUSH_ACK message from the MDS when 'sync' is executed. In fact, it even allows the filesystem to be umounted before receiving this ACK.
All this was tested with a recent nautilus and a kernel with backports roughly corresponding to a 5.0 kernel iirc. Numbers for a master build and mainline kernel are coming up.
Files