Bug #9192
openkrbd: poor read (about 10%) vs write performance
0%
Description
We started testing the 3.17rc1 kernel over the weekend, as it is the only Linus
released kernel that has the fix for bug http://tracker.ceph.com/issues/8818
We noticed that the read performance was much slower then write performance for large
sequential writes to an XFS file system mounted on a kRBD device.
To verify that the problem was not with our Ceph cluster, or XFS, but with the kernel RBD
driver, I wrote a pair of C tools that allows me to directly read/write large sequential blocks
to RBD using either the kernel rbd or lib rbd interface.
Testing with these tools has shown that with some thread counts, the lib rbd
interface is more then 10x faster then doing reads then using the kernel
rbd interface.
With a 16MB block size, a 600 second run time, with each thread writing to its own
image in the same pool, the 3 run average throughput values were.
krbd read total librbd read total krbd write total librbd write total 1 threads 129 MB/sec 1546 MB/sec 879 MB/sec 216 MB/sec 2 threads 230 MB/sec 2651 MB/sec 1400 MB/sec 377 MB/sec 4 threads 375 MB/sec 2758 MB/sec 2020 MB/sec 563 MB/sec 8 threads 563 MB/sec 1216 MB/sec 2560 MB/sec 886 MB/sec 16 threads 863 MB/sec 1750 MB/sec 2561 MB/sec 1294 MB/sec 32 threads 1237 MB/sec 2325 MB/sec 2684 MB/sec 1857 MB/sec 64 threads 1784 MB/sec 2859 MB/sec 2715 MB/sec 2702 MB/sec 128 threads 1651 MB/sec 3942 MB/sec 2270 MB/sec 2878 MB/sec NOTE: RBD cache is not enabled on for librbd NOTE: The images were unmapped while running the librbd tests
Read loop for lib rbd:
for ( i = 0; i < gcsv.count; i++){
off = i * gcsv.blocksize;
readlen = rbd_read(rbdimage, off, gcsv.blocksize, gbuffer);
if ( readlen != (ssize_t)gcsv.blocksize ) {
printf ("ERROR: Read error, read = %ld, blocksize = %ld in loop %ld, byte offset %ld\n",
(long)readlen, gcsv.blocksize, i, (long)off);
}
}
Read loop for kernel rbd
for ( i = 0; i < gcsv.count; i++){
readlen = read(fd, gbuffer, gcsv.blocksize);
if ( readlen != (ssize_t)gcsv.blocksize ) {
printf ("ERROR: Read error, read = %ld, blocksize = %ld in loop %ld, byte offset %ld\n",
(long)readlen, gcsv.blocksize, i, off);
}
off += readlen;
}
Ceph version on all nodes: 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
OS version on all nodes: Ubuntu 14.04.1 LTS
Test client running kRBD and librbd is a seperate systems from the cluster nodes and is configured:
Dual socket CPU E5-2660 @ 2.20GHz (16 total cores)
96 GB RAM
Mellenox dual port 40Gb Ethenet card, using 1 port
Kernel on kRBD clinet:
Linux version 3.17.0-031700rc1-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201408161335 SMP Sat Aug 16 17:36:29 UTC 2014
The kernel was downloaded from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.17-rc1-utopic/
6 OSD Nodes, each with
12 7200RPM 4GB SAS drives each
10GbE public network
10GbE cluster network
3 Dedicated monitors
Test pool info: # ceph osd pool get ERIC-TEST-01 pg_num pg_num: 8192 # ceph osd pool get ERIC-TEST-01 pgp_num pgp_num: 8192 # ceph osd pool get ERIC-TEST-01 size size: 1 # ceph osd pool get ERIC-TEST-01 min_size min_size: 1 Note: We started with size=3, and our write performance was less then 1GB/sec for both librbd and krbd. We went to size=1 for this performance testing, and plan to reset to size=3 once we are done with these tests. Changing the size=1 slightly decreased read performance.
Files