Project

General

Profile

Actions

Bug #9192

open

krbd: poor read (about 10%) vs write performance

Added by Eric Eastman over 9 years ago. Updated over 4 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

We started testing the 3.17rc1 kernel over the weekend, as it is the only Linus
released kernel that has the fix for bug http://tracker.ceph.com/issues/8818

We noticed that the read performance was much slower then write performance for large
sequential writes to an XFS file system mounted on a kRBD device.

To verify that the problem was not with our Ceph cluster, or XFS, but with the kernel RBD
driver, I wrote a pair of C tools that allows me to directly read/write large sequential blocks
to RBD using either the kernel rbd or lib rbd interface.

Testing with these tools has shown that with some thread counts, the lib rbd
interface is more then 10x faster then doing reads then using the kernel
rbd interface.

With a 16MB block size, a 600 second run time, with each thread writing to its own
image in the same pool, the 3 run average throughput values were.

              krbd read total  librbd read total   krbd write total   librbd write total
1   threads   129  MB/sec       1546 MB/sec        879  MB/sec         216  MB/sec
2   threads   230  MB/sec       2651 MB/sec        1400 MB/sec         377  MB/sec
4   threads   375  MB/sec       2758 MB/sec        2020 MB/sec         563  MB/sec
8   threads   563  MB/sec       1216 MB/sec        2560 MB/sec         886  MB/sec
16  threads   863  MB/sec       1750 MB/sec        2561 MB/sec         1294 MB/sec
32  threads   1237 MB/sec       2325 MB/sec        2684 MB/sec         1857 MB/sec
64  threads   1784 MB/sec       2859 MB/sec        2715 MB/sec         2702 MB/sec
128 threads   1651 MB/sec       3942 MB/sec        2270 MB/sec         2878 MB/sec

NOTE: RBD cache is not enabled on for librbd
NOTE: The images were unmapped while running the librbd tests

Read loop for lib rbd:

for ( i = 0; i < gcsv.count; i++){
off = i * gcsv.blocksize;
readlen = rbd_read(rbdimage, off, gcsv.blocksize, gbuffer);
if ( readlen != (ssize_t)gcsv.blocksize ) {
printf ("ERROR: Read error, read = %ld, blocksize = %ld in loop %ld, byte offset %ld\n",
(long)readlen, gcsv.blocksize, i, (long)off);
}
}

Read loop for kernel rbd

for ( i = 0; i < gcsv.count; i++){
readlen = read(fd, gbuffer, gcsv.blocksize);
if ( readlen != (ssize_t)gcsv.blocksize ) {
printf ("ERROR: Read error, read = %ld, blocksize = %ld in loop %ld, byte offset %ld\n",
(long)readlen, gcsv.blocksize, i, off);
}
off += readlen;
}

Ceph version on all nodes: 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
OS version on all nodes: Ubuntu 14.04.1 LTS

Test client running kRBD and librbd is a seperate systems from the cluster nodes and is configured:
Dual socket CPU E5-2660 @ 2.20GHz (16 total cores)
96 GB RAM
Mellenox dual port 40Gb Ethenet card, using 1 port

Kernel on kRBD clinet:
Linux version 3.17.0-031700rc1-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201408161335 SMP Sat Aug 16 17:36:29 UTC 2014
The kernel was downloaded from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.17-rc1-utopic/

6 OSD Nodes, each with
12 7200RPM 4GB SAS drives each
10GbE public network
10GbE cluster network

3 Dedicated monitors

Test pool info:

# ceph osd pool get  ERIC-TEST-01 pg_num
pg_num: 8192
# ceph osd pool get  ERIC-TEST-01 pgp_num
pgp_num: 8192
# ceph osd pool get  ERIC-TEST-01 size
size: 1
# ceph osd pool get  ERIC-TEST-01 min_size
min_size: 1

Note: We started with size=3, and our write performance was less then 1GB/sec for both librbd and krbd.  We went 
to size=1 for this performance testing, and plan to reset to size=3 once we are done with these tests. Changing
the size=1 slightly decreased read performance. 


Files

ceph.conf (962 Bytes) ceph.conf Eric Eastman, 08/21/2014 11:18 AM
Actions #1

Updated by Sage Weil over 9 years ago

  • Priority changed from Normal to High

Have you compared with 3.16 with the same workload? Or any other past kernel?

Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to Need More Info
Actions #3

Updated by Eric Eastman over 9 years ago

We started with the Ubuntu 14.04 supplied 3.13 kernel, but found out that it would not work if 'ceph osd crush tunables firefly' was used. We then moved to the 3.15 and 3.16 Ubuntu PPA Kernels, which support this tunable, but hit bug http://tracker.ceph.com/issues/8818, which hung the system during these tests. I then did limited performance testing with this version of the 3.16 kernel:

http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/wip-request-fn/linux-image-3.16.0-ceph-00037-g0532581_3.16.0-ceph-00037-g0532581-1_amd64.deb

And was seeing slow read performance, but I did not spend time documenting it, as it was a test kernel provided by Ilya Dryomov

The only Linux releases with the fix for bug 8818 seems to be 3.17rc1 and 3.17rc2. I just downloaded 3.17rc2 kernel and plan to do some testing with it this week.

Actions #4

Updated by Ilya Dryomov over 9 years ago

Eric,

To see if it's the new queueing regression, it'd be best to compare wip-request-fn and wip-request-fn~1. The problem with wip-request-fn~1 is of course the deadlock, but it can be worked around - I'll post two gitbuilder links here for you to test. If you don't get to it, I'll put it on my list.

Actions #5

Updated by Eric Eastman over 9 years ago

I plan to test on the Ubuntu PPA 3.16.x once the fix gets into that branch. Will that be good enough?

Actions #6

Updated by Ilya Dryomov over 9 years ago

If it's the queueing change that is causing this, 3.16.x with the fix will show the same results as 3.17, but it would still be useful to rule out any 3.17 changes. (The deadlock fix changed how we queue requests.)

Just in case, these are 3.16, with and without the queueing change, which would be a suspect for something like this.

http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/for-eric-3.16-pre/linux-image-3.16.0-ceph-00035-gaaf37e4_3.16.0-ceph-00035-gaaf37e4-1_amd64.deb

http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/for-eric-3.16-post/linux-image-3.16.0-ceph-00036-gb294606_3.16.0-ceph-00036-gb294606-1_amd64.deb

Actions #7

Updated by Ilya Dryomov over 9 years ago

  • Project changed from rbd to Linux kernel client
  • Subject changed from Poor Kernel RBD read performance with 3.17rc1 Kernel to krbd: poor read performance with 3.17rc1 Kernel
Actions #8

Updated by Eric Eastman over 9 years ago

I was able to get some dedicated test time on one of our Ceph test clusters to rerun the kernel RBD read/write tests using the following Linux kernel versions from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/

Kernel RBD Read/Write rates. All values are MB/second

Kernel:  3.10.57          3.14.21         3.16.5         3.17.0
Threads  read   write     read   write    read   write   read   write
1        121    1134      121    1101     120    1055    122    1006
2        232    1762      244    1792     225    1769    251    1765
4        424    2461      417    2765     444    2794    407    2809
8        747    2169      740    2718     739    4025    717    4070
16       1251   2138      1246   2626     1163   4385    1216   4239
32       1675   2365      1637   2628     1875   4294    1973   4162
64       1619   2103      1508   2617     2727   2374    2813   2544
128      1600   2073      1914   2531     2793   3225    2835   2979

Note: The generic version of these kernels were used.

With all 4 kernels, the single stream kRBD read rate is slow, around 10% of the kRBD write rate.

The cluster is running Ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae) on Ubuntu 14.04.1. There are 3 monitor nodes, 6 OSD nodes, each with 24 4TB disks, for a total of 144 OSD drives. The test pool has 16384 PGs. The OSD nodes are using both a public and private networks, each using separate 10GbE NIC. On the test pool, size and min_size were both set to 1 to get the best write rates.

The kernel RBD test client is a dedicated system. It is running Ubuntu 14.04.1 with Ceph version 0.80.6. It has 16 cores @ 2.2GHz and 96GB RAM. The client network connection to the Ceph Public Network is a single 40GbE connection

The ceph osd crush tunables were set to bobtail to support 3.10 through 3.17 kernels.

Actions #9

Updated by Ilya Dryomov over 9 years ago

  • Subject changed from krbd: poor read performance with 3.17rc1 Kernel to krbd: poor read (about 10%) vs write performance
  • Status changed from Need More Info to 12

Hi Eric,

Thanks for doing this. I was concerned about this being a regression after the queueing changes, but it looks like it's not, so this is a relief. Probably related to #9573, I'll look at both as time allows.

Actions #10

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF