Project

General

Profile

Benchmark Ceph Cluster Performance » History » Revision 2

Revision 1 (Jessica Mack, 06/21/2015 01:00 AM) → Revision 2/4 (Jessica Mack, 06/22/2015 01:16 AM)

h1. Benchmark Ceph Cluster Performance 

 One of the most common questions we hear is "How do I check if my cluster is running at maximum performance?". Wonder no more - in this guide, we'll walk you through some tools you can use to benchmark your Ceph cluster. 
 
 *NOTE*: The ideas in this article are based on "Sebastian Han's blog post":http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/,    "TelekomCloud's blog post":https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd.html and inputs from Ceph developers and engineers. 

 h3. Get Baseline Performance Statistics 

 Fundamentally, benchmarking is all about comparison. You won't know if you Ceph cluster is performing below par unless you first identify what its maximum possible performance is. So, before you start benchmarking your cluster, you need to obtain baseline performance statistics for the two main components of your Ceph infrastructure: your disks and your network. 

 h4. Benchmark Your Disks 

 The simplest way to benchmark your disk is with dd. Use the following command to read and write a file, remembering to add the oflag parameter to bypass the disk page cache: 
 @shell> dd if=/dev/zero of=here bs=1G count=1 oflag=direct@ 

 !{width:50%}image1.png! 

 !image1.png! 


 Note the last statistic provided, which indicates disk performance in MB/sec. Perform this test for each disk in your cluster, noting the results. 

 h4. 
 Benchmark Your Network 

 
 Another key factor affecting Ceph cluster performance is network thoroughput. A good tool for this is "_iperf_":https://iperf.fr/, iperf, which uses a client-server connection to measure TCP and UDP bandwidth. 
 
 You can install _iperf_ iperf using _apt-get apt-get install iperf_ iperf or _yum yum install iperf_. 

 _iperf_ iperf. 
 iperf needs to be installed on at least two nodes in your cluster. Then, on one of the nodes, start the _iperf_ iperf server using the following command: 

 @shell> 
 shell> iperf -s@ 

 -s 
 On another node, start the client with the following command, remembering to use the IP address of the node hosting the _iperf_ iperf server: 

 @shell> 
 shell> iperf -c 192.168.1.1@ 

 !{width:50%}image2.png! 

 192.168.1.1 
 image2.png 
 Note the bandwidth statistic in Mbits/sec, as this indicates the maximum throughput supported by your network. 
 
 Now that you have some baseline numbers, you can start benchmarking your Ceph cluster to see if it's giving you similar performance. Benchmarking can be performed at different levels: you can perform low-level benchmarking of the storage cluster itself, or you can perform higher-level benchmarking of the key interfaces, such as block devices and object gateways. The following sections discuss each of these approaches. 
 
 NOTE: Before running any of the benchmarks in subsequent sections, drop all caches using a command like this: 
 @shell> shell> sudo echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync@ 

 h3. sync 
 Benchmark a Ceph Storage Cluster 

 
 Ceph includes the _rados bench_ rados bench command, designed specifically to benchmark a RADOS storage cluster. To use it, create a storage pool and then use _rados bench_ rados bench to perform a write benchmark, as shown below. 
 
 The _rados_ rados command is included with Ceph. 

 @shell> 
 shell> ceph osd pool create scbench 100 100 
 shell> rados bench -p scbench 10 write --no-cleanup@ 

 !{width:50%}image3.png! 

 --no-cleanup 
 image3.png 
 This creates a new pool named 'scbench' and then performs a write benchmark for 10 seconds. Notice the _--no-cleanup_ --no-cleanup option, which leaves behind some data. The output gives you a good indicator of how fast your cluster can write data. 
 Two types of read benchmarks are available: seq for sequential reads and rand for random reads. To perform a read benchmark, use the commands below: 
 @shell> shell> rados bench -p scbench 10 seq 
 shell> rados bench -p scbench 10 rand@ 

 !{width:50%}image4.png! 

 rand 
 image4.png 
 You can also add the _-t_ -t parameter to increase the concurrency of reads and writes (defaults to 16 threads), or the _-b_ -b parameter to change the size of the object being written (defaults to 4 MB). It's also a good idea to run multiple copies of this benchmark against different pools, to see how performance changes with multiple clients. 
 Once you have the data, you can begin comparing the cluster read and write statistics with the disk-only benchmarks performed earlier, identify how much of a performance gap exists (if any), and start looking for reasons. 
 You can clean up the benchmark data left behind by the write benchmark with this command: 

 @shell> 
 shell> rados -p scbench cleanup@ 

 h3. cleanup 
 Benchmark a Ceph Block Device 

 
 If you're a fan of Ceph block devices, there are two tools you can use to benchmark their performance. Ceph already includes the _rbd bench_ rbd bench command, but you can also use the popular I/O benchmarking tool "_fio_":http://git.kernel.dk/?p=fio.git;a=summary, fio, which now comes with built in support for RADOS block devices. 

 
 The _rbd_ rbd command is included with Ceph. RBD support in _fio_ fio is relatively new, therefore you will need to download it from its repository and then compile and install it using_ using configure && make && make install_. install. Note that you must install the librbd-dev development package with _apt-get apt-get install librbd-dev_ librbd-dev or _yum yum install librbd-dev_ librbd-dev before compiling _fio_ fio in order to activate its RBD support. 

 
 Before using either of these two tools, though, create a block device using the commands below: 
 @shell> shell> ceph osd pool create rbdbench 100 100 
 shell> rbd create image01 --size 1024 --pool rbdbench 
 shell> sudo rbd map image01 --pool rbdbench --name client.admin 
 shell> sudo /sbin/mkfs.ext4 -m0 /dev/rbd/rbdbench/image01 
 shell> sudo mkdir /mnt/ceph-block-device 
 shell> sudo mount /dev/rbd/rbdbench/image01 /mnt/ceph-block-device@ 

 /mnt/ceph-block-device 
 The _rbd bench-write_ rbd bench-write command generates a series of sequential writes to the image and measure the write throughput and latency. Here's an example: 

 @shell> 
 shell> rbd bench-write image01 --pool=rbdbench@ 

 !{width:50%}image5.png! 

 --pool=rbdbench 
 image5.png 
 Or, you can use _fio_ fio to benchmark your block device. An example _rbd.fio_ rbd.fio template is included with the _fio_ fio source code, which performs a 4K random write test against a RADOS block device via librbd. Note that you will need to update the template with the correct names for your pool and device, as shown below. 
 @[global] [global] 
 ioengine=rbd 
 clientname=client.admin 
 pool=rbdbench 
 rbdname=image01 
 rw=randwrite 
 bs=4k 
 [rbd_iodepth32] 
 iodepth=32@ 

 iodepth=32 
 Then, run _fio_ fio as follows: 
 @shell> shell> fio examples/rbd.fio@ 

 !{width:50%}image6.png! 

 h3. examples/rbd.fio 
 image6.png 
 Benchmark a Ceph Object Gateway 

 
 When it comes to benchmarking the Ceph object gateway, look no further than _swift-bench_, swift-bench, the benchmarking tool included with OpenStack Swift. The _swift-bench_ swift-bench tool tests the performance of your Ceph cluster by simulating client PUT and GET requests and measuring their performance. 
 
 You can install _swift-bench_ swift-bench using _pip pip install swift && pip install swift-bench_. 

 swift-bench. 
 To use _swift-bench_, swift-bench, you need to first create a gateway user and subuser, as shown below: 
 @shell> shell> sudo radosgw-admin user create --uid="benchmark" --display-name="benchmark" 
 shell> sudo radosgw-admin subuser create --uid=benchmark --subuser=benchmark:swift 
 --access=full 
 shell> sudo radosgw-admin key create --subuser=benchmark:swift --key-type=swift 
 --secret=guessme 
 shell> radosgw-admin user modify --uid=benchmark --max-buckets=0@ 

 --max-buckets=0 
 Next, create a configuration file for swift-bench on a client host, as below. Remember to update the authentication URL to reflect that of your Ceph object gateway and to use the correct user name and credentials. 
 @[bench] [bench] 
 auth = http://gateway-node/auth/v1.0 
 user = benchmark:swift 
 key = guessme 
 auth_version = 1.0@ 

 1.0 
 You can now run a benchmark as below. Use the _-c_ -c parameter to adjust the number of concurrent connections (this example uses 64) and the _-s_ -s parameter to adjust the size of the object being written (this example uses 4K objects). The _-n_ -n and _-g_ -g parameters control the number of objects to PUT and GET respectively. 
 @shell> shell> swift-bench -c 64 -s 4096 -n 1000 -g 100 /tmp/swift.conf@ 

 !image7.png! 

 /tmp/swift.conf 
 image7.png 
 Although _swift-bench_ swift-bench measures performance in number of objects/sec, it's easy enough to convert this into MB/sec, by multiplying by the size of each object. However, you should be wary of comparing this directly with the baseline disk performance statistics you obtained earlier, since a number of other factors also influence these statistics, such as: 
 * the level of replication (and latency overhead) 
 * full data journal writes (offset in some situations by journal data coalescing) 
 * fsync on the OSDs to guarantee data safety 
 * metadata overhead for keeping data stored in RADOS 
 * latency overhead (network, ceph, etc) makes readahead more important 
 
 TIP: When it comes to object gateway performance, there's no hard and fast rule you can use to easily improve performance. In some cases, Ceph engineers have been able to obtain better-than-baseline performance using clever caching and coalescing strategies, whereas in other cases, object gateway performance has been lower than disk performance due to latency, fsync and metadata overhead. 

 h3. Conclusion 

 
 ConclusionEdit section 
 There are a number of tools available to benchmark a Ceph cluster, at different levels: disk, network, cluster, device and gateway. You should now have some insight into how to approach the benchmarking process and begin generating performance data for your cluster. Good luck!