Bug #20938
openCephFS: concurrent access to file from multiple nodes blocks for seconds
0%
Description
When accessing the same file opened for read/write on multiple nodes via ceph-fuse, performance drops by about 3 orders of magnitude to 1-2 operations/second from thousands of operations/second. Tested both on Jewel 10.2.9 and the latest Luminous RC 12.1.2.
The core of the problem boils down to the following operation being run on the same file on multiple nodes (in a loop in the test program):
int fd = open(filename, mode); read(fd, buffer, 100); close(fd);
Here are some results on our cluster:
One node, mode=read-only: 7000 opens/second
One node, mode=read-write: 7000 opens/second
Two nodes, mode=read-only: 7000 opens/second/node
Two nodes, mode=read-write: around 0.5 opens/second/node (!!!)
Two nodes, one read-only, one read-write: around 0.5 opens/second/node (!!!)
Two nodes, mode=read-write, but remove the 'read(fd, buffer,100)' line from the code: 500 opens/second/node
There seems to be some problems with opening the same file read/write and reading from the file on multiple nodes. That operation seems to be 3 orders of magnitude slower than other parallel access patterns to the same file. The 1 second time to open files almost seems like some timeout is happening somewhere. I have some suspicion that this has to do with capability management between the fuse client and the MDS, but I don't know enough about that protocol to make an educated assessment.
The attached small C program reproduces the issue. Run it on two different nodes with "timed_openrw_read <filename> rw" where <filename> is a file in cephfs with some data in it.
Files