Bug #45153
closedfsync locking up in certain conditions
0%
Description
I've been setting up/testing cephfs on a server and during one of the tests (with an application using nedb), I noticed a huge delay which I tracked down to cephfs being slow/locking up on an fsync call.
I was able to reproduce the problem with a simple command line. The situation is if a new file is created, then synced, then renamed and synced again (nedb's method of overwriting the database file in a crash safe manner), then the final fsync will take 5 seconds to return as it locks up.
Doing :
# touch to-rename && sync to-rename && mv to-rename final && time sync final real 0m4.506s user 0m0.000s sys 0m0.001s
will take about 5 seconds to resolve (according to the 'time' command on the final sync) while doing the same without the second sync of the new file, takes 0.003s
# touch to-rename && mv to-rename final && time sync final real 0m0.003s user 0m0.001s sys 0m0.000s
Note that in this context, the file does not exist, its size is 0 bytes, but it could be any size, I only use 0 bytes to show that it's not taking time to actually sync any large amounts of data. I have also tested it as a user and as root, with no difference.
See attached screenshot.
I have also run the command in a while loop and I get a consistent ~5 second time which leads me to think it's a timeout happening and the fsync is locked waiting for something.
# while true; do touch to-rename && sync to-rename && mv to-rename final && time sync final; done real 0m3.172s user 0m0.001s sys 0m0.000s real 0m4.990s user 0m0.002s sys 0m0.000s real 0m4.985s user 0m0.001s sys 0m0.000s real 0m4.993s user 0m0.001s sys 0m0.000s real 0m4.993s user 0m0.001s sys 0m0.000s real 0m4.994s user 0m0.001s sys 0m0.000s real 0m4.993s user 0m0.001s sys 0m0.000s real 0m4.993s user 0m0.001s sys 0m0.000s real 0m4.995s user 0m0.001s sys 0m0.000s real 0m4.991s user 0m0.000s sys 0m0.001s real 0m4.995s user 0m0.000s sys 0m0.001s real 0m4.993s user 0m0.001s sys 0m0.000s
I have also tested that if I 'ls' the file in question, it makes no difference, but if I 'ls' the parent directory, it will immediately unlock the fsync and it returns right away before the 5 second timeout. So doing a readdir seems to unlock the frozen task.
I am filing this under the kernel client because testing with ceph-fuse, I cannot reproduce the same behavior. I have also realized that if I use ceph-fuse to mount the same cluster in a different directory on the same machine, then the kernel mount stops behaving in this manner as well. Once I unmount the ceph-fuse directory, the 5 second delay starts happening again. Having the ceph-fuse mounted on its own does not exhibit the problem.
My setup is using ceph 15.2.1, installed with cephadm, I have 3 mon, 2 mgr, 2 mds and 1 osd (for now, as I'm still testing things), spread over 3 hosts within the same network. The local machine I was testing this on had a mon, mgr and the osd on it.
The system is Debian 10.3 with kernel 4.19.0-8-cloud-amd64. The machine also had at least a few GB of free/available RAM to use and its CPU usage didn't seem to go beyond 3% to 5%.
I hope this helps find/resolve the issue, and that it's not a duplicate.
I've marked it as major since it's a major performance issue (my apps started freezing for 10 to 20 seconds), but I've switched to ceph-fuse for now, so it's not a critical issue for me at the moment.
Thank you.
Files
Updated by Jeff Layton about 4 years ago
An ~5s delay is usually a telltale sign that it's getting stuck waiting for an MDS tick and journal flush.
I suspect this is probably a duplicate of #44744. If so that will likely be fixed in v5.8, though it may be backportable as well. If you're able to test recent "testing" series kernels here:
https://shaman.ceph.com/builds/kernel/
...then that would be helpful.
Updated by Youness Alaoui about 4 years ago
Jeff Layton wrote:
An ~5s delay is usually a telltale sign that it's getting stuck waiting for an MDS tick and journal flush.
I suspect this is probably a duplicate of #44744. If so that will likely be fixed in v5.8, though it may be backportable as well. If you're able to test recent "testing" series kernels here:
https://shaman.ceph.com/builds/kernel/
...then that would be helpful.
Thank you,
I hope it gets backported. I unfortunately cannot test with v5.8. Do you have links to the actual commits that would have fixed it? I could try to backport them myself to my kernel and see if that works.
Note: the shaman.ceph.com builds you linked to, all give 404 because of a double '//' in the URL, but even fixing that, it just gives empty content.
Thanks.
Updated by Jeff Layton about 4 years ago
- Assignee set to Jeff Layton
Updated by Jeff Layton over 3 years ago
- Status changed from New to Duplicate
- Parent task set to #44744
Closing as duplicate of #44744
Updated by Patrick Donnelly over 2 years ago
- Is duplicate of Bug #44744: Slow file creation/sync on kernel cephfs added