Bug #9192
openkrbd: poor read (about 10%) vs write performance
0%
Description
We started testing the 3.17rc1 kernel over the weekend, as it is the only Linus
released kernel that has the fix for bug http://tracker.ceph.com/issues/8818
We noticed that the read performance was much slower then write performance for large
sequential writes to an XFS file system mounted on a kRBD device.
To verify that the problem was not with our Ceph cluster, or XFS, but with the kernel RBD
driver, I wrote a pair of C tools that allows me to directly read/write large sequential blocks
to RBD using either the kernel rbd or lib rbd interface.
Testing with these tools has shown that with some thread counts, the lib rbd
interface is more then 10x faster then doing reads then using the kernel
rbd interface.
With a 16MB block size, a 600 second run time, with each thread writing to its own
image in the same pool, the 3 run average throughput values were.
krbd read total librbd read total krbd write total librbd write total 1 threads 129 MB/sec 1546 MB/sec 879 MB/sec 216 MB/sec 2 threads 230 MB/sec 2651 MB/sec 1400 MB/sec 377 MB/sec 4 threads 375 MB/sec 2758 MB/sec 2020 MB/sec 563 MB/sec 8 threads 563 MB/sec 1216 MB/sec 2560 MB/sec 886 MB/sec 16 threads 863 MB/sec 1750 MB/sec 2561 MB/sec 1294 MB/sec 32 threads 1237 MB/sec 2325 MB/sec 2684 MB/sec 1857 MB/sec 64 threads 1784 MB/sec 2859 MB/sec 2715 MB/sec 2702 MB/sec 128 threads 1651 MB/sec 3942 MB/sec 2270 MB/sec 2878 MB/sec NOTE: RBD cache is not enabled on for librbd NOTE: The images were unmapped while running the librbd tests
Read loop for lib rbd:
for ( i = 0; i < gcsv.count; i++){
off = i * gcsv.blocksize;
readlen = rbd_read(rbdimage, off, gcsv.blocksize, gbuffer);
if ( readlen != (ssize_t)gcsv.blocksize ) {
printf ("ERROR: Read error, read = %ld, blocksize = %ld in loop %ld, byte offset %ld\n",
(long)readlen, gcsv.blocksize, i, (long)off);
}
}
Read loop for kernel rbd
for ( i = 0; i < gcsv.count; i++){
readlen = read(fd, gbuffer, gcsv.blocksize);
if ( readlen != (ssize_t)gcsv.blocksize ) {
printf ("ERROR: Read error, read = %ld, blocksize = %ld in loop %ld, byte offset %ld\n",
(long)readlen, gcsv.blocksize, i, off);
}
off += readlen;
}
Ceph version on all nodes: 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
OS version on all nodes: Ubuntu 14.04.1 LTS
Test client running kRBD and librbd is a seperate systems from the cluster nodes and is configured:
Dual socket CPU E5-2660 @ 2.20GHz (16 total cores)
96 GB RAM
Mellenox dual port 40Gb Ethenet card, using 1 port
Kernel on kRBD clinet:
Linux version 3.17.0-031700rc1-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201408161335 SMP Sat Aug 16 17:36:29 UTC 2014
The kernel was downloaded from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.17-rc1-utopic/
6 OSD Nodes, each with
12 7200RPM 4GB SAS drives each
10GbE public network
10GbE cluster network
3 Dedicated monitors
Test pool info: # ceph osd pool get ERIC-TEST-01 pg_num pg_num: 8192 # ceph osd pool get ERIC-TEST-01 pgp_num pgp_num: 8192 # ceph osd pool get ERIC-TEST-01 size size: 1 # ceph osd pool get ERIC-TEST-01 min_size min_size: 1 Note: We started with size=3, and our write performance was less then 1GB/sec for both librbd and krbd. We went to size=1 for this performance testing, and plan to reset to size=3 once we are done with these tests. Changing the size=1 slightly decreased read performance.
Files
Updated by Sage Weil over 9 years ago
- Priority changed from Normal to High
Have you compared with 3.16 with the same workload? Or any other past kernel?
Updated by Eric Eastman over 9 years ago
We started with the Ubuntu 14.04 supplied 3.13 kernel, but found out that it would not work if 'ceph osd crush tunables firefly' was used. We then moved to the 3.15 and 3.16 Ubuntu PPA Kernels, which support this tunable, but hit bug http://tracker.ceph.com/issues/8818, which hung the system during these tests. I then did limited performance testing with this version of the 3.16 kernel:
And was seeing slow read performance, but I did not spend time documenting it, as it was a test kernel provided by Ilya Dryomov
The only Linux releases with the fix for bug 8818 seems to be 3.17rc1 and 3.17rc2. I just downloaded 3.17rc2 kernel and plan to do some testing with it this week.
Updated by Ilya Dryomov over 9 years ago
Eric,
To see if it's the new queueing regression, it'd be best to compare wip-request-fn and wip-request-fn~1. The problem with wip-request-fn~1 is of course the deadlock, but it can be worked around - I'll post two gitbuilder links here for you to test. If you don't get to it, I'll put it on my list.
Updated by Eric Eastman over 9 years ago
I plan to test on the Ubuntu PPA 3.16.x once the fix gets into that branch. Will that be good enough?
Updated by Ilya Dryomov over 9 years ago
If it's the queueing change that is causing this, 3.16.x with the fix will show the same results as 3.17, but it would still be useful to rule out any 3.17 changes. (The deadlock fix changed how we queue requests.)
Just in case, these are 3.16, with and without the queueing change, which would be a suspect for something like this.
Updated by Ilya Dryomov over 9 years ago
- Project changed from rbd to Linux kernel client
- Subject changed from Poor Kernel RBD read performance with 3.17rc1 Kernel to krbd: poor read performance with 3.17rc1 Kernel
Updated by Eric Eastman over 9 years ago
I was able to get some dedicated test time on one of our Ceph test clusters to rerun the kernel RBD read/write tests using the following Linux kernel versions from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/
Kernel RBD Read/Write rates. All values are MB/second Kernel: 3.10.57 3.14.21 3.16.5 3.17.0 Threads read write read write read write read write 1 121 1134 121 1101 120 1055 122 1006 2 232 1762 244 1792 225 1769 251 1765 4 424 2461 417 2765 444 2794 407 2809 8 747 2169 740 2718 739 4025 717 4070 16 1251 2138 1246 2626 1163 4385 1216 4239 32 1675 2365 1637 2628 1875 4294 1973 4162 64 1619 2103 1508 2617 2727 2374 2813 2544 128 1600 2073 1914 2531 2793 3225 2835 2979 Note: The generic version of these kernels were used.
With all 4 kernels, the single stream kRBD read rate is slow, around 10% of the kRBD write rate.
The cluster is running Ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae) on Ubuntu 14.04.1. There are 3 monitor nodes, 6 OSD nodes, each with 24 4TB disks, for a total of 144 OSD drives. The test pool has 16384 PGs. The OSD nodes are using both a public and private networks, each using separate 10GbE NIC. On the test pool, size and min_size were both set to 1 to get the best write rates.
The kernel RBD test client is a dedicated system. It is running Ubuntu 14.04.1 with Ceph version 0.80.6. It has 16 cores @ 2.2GHz and 96GB RAM. The client network connection to the Ceph Public Network is a single 40GbE connection
The ceph osd crush tunables were set to bobtail to support 3.10 through 3.17 kernels.
Updated by Ilya Dryomov over 9 years ago
- Subject changed from krbd: poor read performance with 3.17rc1 Kernel to krbd: poor read (about 10%) vs write performance
- Status changed from Need More Info to 12
Hi Eric,
Thanks for doing this. I was concerned about this being a regression after the queueing changes, but it looks like it's not, so this is a relief. Probably related to #9573, I'll look at both as time allows.