Project

General

Profile

Tuning for All Flash Deployments » History » Version 7

Patrick McGarry, 01/14/2017 08:09 PM

1 1 Patrick McGarry
h1. Tuning for All Flash Deployments
2
3
Ceph Tuning and Best Practices for All Flash Intel® Xeon® Servers
4
Last updated:  January 2017
5
6
7
8
9
h2. Table of Contents
10
11 7 Patrick McGarry
[[Tuning_for_All_Flash_Deployments#Introduction|Introduction]]
12 1 Patrick McGarry
Ceph Storage Hardware Guidelines
13
Intel Tuning and Optimization Recommendations for Ceph
14
Server Tuning
15
Ceph Client Configuration
16
Ceph Storage Node NUMA Tuning
17
Memory Tuning
18
NVMe SSD partitioning
19
OS Tuning (must be done on all Ceph nodes)
20
Kernel Tuning
21
Filesystem considerations
22
Disk read ahead	
23
OSD: RADOS
24
RBD Tuning
25
RGW: Rados Gateway Tuning
26
Erasure Coding Tuning
27
Appendix
28
Sample Ceph.conf
29
Sample sysctl.conf
30
All-NVMe Ceph Cluster Tuning for MySQL workload
31
Ceph.conf
32
CBT YAML
33
MySQL configuration file (my.cnf)
34
Sample Ceph Vendor Solutions
35
36
37
38
39
h3. Introduction 
40
41
Ceph is a scalable, open source, software-defined storage offering that runs on commodity hardware. Ceph has been developed from the ground up to deliver object, block, and file system storage in a single software platform that is self-managing, self-healing and has no single point of failure. Because of its highly scalable, software defined storage architecture, can be a powerful storage solution to consider. 
42
This document covers Ceph tuning guidelines specifically for all flash deployments based on extensive testing by Intel with a variety of system, operating system and Ceph optimizations to achieve highest possible performance for servers with Intel® Xeon® processors and Intel® Solid State Drive Data Center (Intel® SSD DC) Series. Details of OEM system SKUs and Ceph reference architectures for targeted use-cases can be found on ceph.com web-site.  
43
Ceph Storage Hardware Guidelines  
44
Standard configuration is ideally suited for throughput oriented workloads (e.g., analytics,  DVR). Intel® SSD Data Center P3700 series is recommended to achieve best possible performance while balancing the cost.
45
CPU
46
Intel® Xeon® CPU E5-2650v4 or higher
47
Memory
48
Minimum of 64 GB
49
NIC
50
10GbE 
51
Disks
52
1x 1.6TB P3700 + 12 x 4TB HDDs (1:12 ratio)
53
P3700 as Journal and caching
54
Caching software
55
Intel Cache Acceleration Software for read caching, option: Intel® Rapid Storage Technology enterprise/MD4.3
56
57
TCO optimized configuration provides best possible performance for performance centric workloads (e.g., database) while achieving the TCO with a mix of SATA SSDs and NVMe SSDs. 
58
CPU
59
Intel® Xeon® CPU E5-2690v4 or higher
60
Memory
61
128 GB or higher 
62
NIC
63
Dual 10GbE
64
Disks
65
1x 800GB P3700 + 4x S3510 1.6TB
66
67
IOPS optimized configuration provides best performance for workloads that demand low latency using all NVMe SSD configuration. 
68
CPU
69
Intel® Xeon® CPU E5-2699v4
70
Memory
71
128 GB or higher
72
NIC
73
1x 40GbE, 4x 10GbE
74
Disks
75
 4 x P3700 2TB
76
77
Intel Tuning and Optimization Recommendations for Ceph
78
Server Tuning
79
Ceph Client Configuration
80
In a balanced system configuration both client and storage node configuration need to be optimized to get the best possible cluster performance. Care needs to be taken to ensure Ceph client node server has enough CPU bandwidth to achieve optimum performance. Below graph shows the end to end performance for different client CPU configurations for block workload.
81
82
Figure 1: Client CPU cores and Ceph cluster impact
83
Ceph Storage Node NUMA Tuning
84
In order to avoid latency, it is important to minimize inter-socket communication between NUMA nodes to service client IO as fast as possible and avoid latency penalty.  Based on extensive set of experiments conducted in Intel, it is recommended to pin Ceph OSD processes on the same CPU socket that has NVMe SSDs, HBAs and NIC devices attached. 
85
 
86
Figure 2: NUMA node configuration and OSD assignment
87
Ceph startup scripts need change with setaffinity=" numactl --membind=0 --cpunodebind=0 "
88
89
Below performance data shows best possible cluster throughput and lower latency when Ceph OSDs are partitioned by CPU socket to manage media connected to local CPU socket and network IO not going through QPI link. 
90
91
92
Figure 3: NUMA node performance compared to default system configuration
93
Memory Tuning
94
Ceph default packages use tcmalloc. For flash optimized configurations, we found jemalloc providing best possible performance without performance degradation over time. Ceph supports jemalloc for the hammer release and later releases but you need to build with jemalloc option enabled.
95
Below graph in figure 4 shows how thread cache size impacts throughput. By tuning thread cache size, performance is comparable between TCMalloc and JEMalloc. However as shown in Figure 5 and Figure 6, TCMalloc performance degrades over time unlike JEMalloc. 
96
97
98
Figure 4: Thread cache size impact over performance
99
100
101
Figure 5: TCMalloc performance in a running cluster over time
102
103
104
105
Figure 6: JEMalloc performance in a running cluster over time
106
107
NVMe SSD partitioning
108
It is not possible to take advantage of NVMe SSD bandwidth with single OSD.  4 is the optimum number of partitions per SSD drive that gives best possible performance. 
109
110
Figure 7: Ceph OSD latency with different SSD partitions
111
112
Figure 8: CPU Utilization with different #of SSD partitions 
113
114
OS Tuning (must be done on all Ceph nodes)
115
Kernel Tuning
116
1. Modify system control in /etc/sysctl.conf
117
# Kernel sysctl configuration file for Red Hat Linux
118
#
119
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
120
# sysctl.conf(5) for more details.
121
122
# Controls IP packet forwarding
123
net.ipv4.ip_forward = 0
124
125
# Controls source route verification
126
net.ipv4.conf.default.rp_filter = 1
127
128
# Do not accept source routing
129
net.ipv4.conf.default.accept_source_route = 0
130
131
# Controls the System Request debugging functionality of the kernel
132
kernel.sysrq = 0
133
134
# Controls whether core dumps will append the PID to the core filename.
135
# Useful for debugging multi-threaded applications.
136
kernel.core_uses_pid = 1
137
138
# disable TIME_WAIT.. wait ..
139
net.ipv4.tcp_tw_recycle = 1
140
net.ipv4.tcp_tw_reuse = 1
141
142
# Controls the use of TCP syncookies
143
net.ipv4.tcp_syncookies = 0
144
145
# double amount of allowed conntrack
146
net.netfilter.nf_conntrack_max = 2621440
147
net.netfilter.nf_conntrack_tcp_timeout_established = 1800
148
149
# Disable netfilter on bridges.
150
net.bridge.bridge-nf-call-ip6tables = 0
151
net.bridge.bridge-nf-call-iptables = 0
152
net.bridge.bridge-nf-call-arptables = 0
153
154
# Controls the maximum size of a message, in bytes
155
kernel.msgmnb = 65536
156
157
# Controls the default maxmimum size of a mesage queue
158
kernel.msgmax = 65536
159
160
# Controls the maximum shared segment size, in bytes
161
kernel.shmmax = 68719476736
162
163
# Controls the maximum number of shared memory segments, in pages
164
kernel.shmall = 4294967296
165
166
2. IP jumbo frames
167
If your switch supports jumbo frames, then the larger MTU size is helpful. Our tests showed 9000 MTU improves Sequential Read/Write performance.
168
169
3. Set the Linux disk scheduler to cfq
170
Filesystem considerations
171
Ceph is designed to be mostly filesystem agnostic–the only requirement being that the filesystem supports extended attributes (xattrs). Ceph OSDs depend on the Extended Attributes (XATTRs) of the underlying file system for: a) Internal object state b) Snapshot metadata c) RGW Access control Lists etc. Currently XFS is the recommended file system. We recommend using big inode size (default inode size is 256 bytes) when creating the file system:
172
mkfs.xfs –i size=2048 /dev/sda1
173
Setting the inode size is important, as XFS stores xattr data in the inode. If the metadata is too large to fit in the inode, a new extent is created, which can cause quite a performance problem. Upping the inode size to 2048 bytes provides enough room to write the default metadata, plus a little headroom.
174
The following example mount options are recommended when using XFS:
175
mount -t xfs -o noatime,nodiratime,nobarrier,logbufs=8 /dev/sda1 /var/lib/Ceph/osd/Ceph-0
176
The following are specific recommendations for Intel SSD and Ceph. 
177
mkfs.xfs -f -K -i size=2048 -s size=4096 /dev/md0
178
/bin/mount -o noatime,nodiratime,nobarrier /dev/md0 /data/mysql
179
Disk read ahead
180
Read_ahead is the file prefetching technology used in the Linux operating system. It is a system call that loads a file's contents into the page cache. When a file is subsequently accessed, its contents are read from physical memory rather than from disk, which is much faster.
181
echo 2048 > /sys/block/${disk}/queue/read_ahead_kb  (default 128)
182
183
Per disk performance
184
128
185
512
186
%
187
Sequential Read(MB/s)
188
1232 MB/s
189
3251 MB/s
190
+163%
191
* 6 nodes Ceph cluster, each have 20 OSD (750 GB * 7200 RPM. 2.5’’ HDD)
192
193
OSD: RADOS
194
Tuning have significant performance impact of Ceph storage system, there are hundreds of tuning knobs for swift. We will introduce some of the most important tuning settings.
195
1. Large PG/PGP number (since Cuttlefish)
196
We find using large PG number per OSD (>200) will improve the performance. Also this will ease the data distribution unbalance issue
197
(default to 8)
198
ceph osd pool create testpool 8192 8192
199
200
2. omap data on separate disks (since Giant)
201
Mounting omap directory to some separate SSD will improve the random write performance. In our testing we saw a ~20% performance improvement.
202
203
3. objecter_inflight_ops/objecter_inflight_op_bytes (since Cuttlefish)
204
objecter_inflight_ops/objecter_inflight_op_bytes throttles tell objecter to throttle outgoing ops according its budget, objecter is responsible for send requests to OSD. By default tweak this parameter to 10x 
205
(default to 1024/1024*1024*100)
206
objecter_inflight_ops = 10240
207
objecter_inflight_op_bytes = 1048576000
208
209
4. ms_dispatch_throttle_bytes (since Cuttlefish)
210
ms_dispatch_throttle_bytes throttle is to throttle dispatch message size for simple messenger, by default tweak this parameter to 10x. 
211
ms_dispatch_throttle_bytes = 1048576000
212
213
5. journal_queue_max_bytes/journal_queue_max_ops (since Cuttlefish)
214
journal_queue_max_bytes/journal_queue_max_op throttles are to throttle inflight ops for journal, 
215
If journal does not get enough budget for current op, it will block osd op thread, by default tweak this parameter to 10x.
216
journal_queueu_max_ops = 3000
217
journal_queue_max_bytes = 1048576000
218
219
220
6. filestore_queue_max_ops/filestore_queue_max_bytes (since Cuttlefish)
221
filestore_queue_max_ops/filestore_queue_max_bytes throttle are used to throttle inflight ops for filestore, these throttles are checked before sending ops to journal, so if filestore does not get enough budget for current op, osd op thread will be blocked, by default tweak this parameter to 10x.
222
filestore_queue_max_ops=5000
223
filestore_queue_max_bytes = 1048576000
224
225
7. filestore_op_threads controls the number of filesystem operation threads that execute in parallel
226
If the storage backend is fast enough and has enough queues to support parallel operations, it’s recommended to increase this parameter, given there is enough CPU head room.
227
filestore_op_threads=6
228
229
8. journal_max_write_entries/journal_max_write_bytes (since Cuttlefish)
230
journal_max_write_entries/journal_max_write_bytes throttle are used to throttle ops or bytes for every journal write, tweaking these two parameters maybe helpful for small write, by default tweak these two parameters to 10x
231
journal_max_write_entries = 5000
232
journal_max_write_bytes = 1048576000
233
234
       
235
9. osd_op_num_threads_per_shard/osd_op_num_shards (since Firefly)
236
osd_op_num_shards set number of queues to cache requests , osd_op_num_threads_per_shard is    threads number for each queue,  adjusting these two parameters depends on cluster.
237
After several performance tests with different settings, we concluded that default parameters provide best performance.
238
239
10. filestore_max_sync_interval (since Cuttlefish)
240
filestore_max_sync_interval control the interval that sync thread flush data from memory to disk, by default filestore write data to memory and sync thread is responsible for flushing data to disk, then journal entries can be trimmed. Note that large filestore_max_sync_interval can cause performance spike. By default tweak this parameter to 10 seconds
241
filestore_max_sync_interval = 10
242
243
244
11. ms_crc_data/ms_crc_header (since Cuttlefish)
245
Disable crc computation for simple messenger, this can reduce CPU utilization
246
247
12. filestore_fd_cache_shards/filestore_fd_cache_size (since Firefly)
248
filestore cache is map from objectname to fd, filestore_fd_cache_shards set number of LRU Cache,  filestore_fd_cache_size is cache size, tweak these two parameter maybe reduce lookup time of fd
249
250
251
13. Set debug level to 0 (since Cuttlefish)
252
For an all-SSD Ceph cluster, set debug level for sub system to 0 will improve the performance.  
253
debug_lockdep = 0/0
254
debug_context = 0/0
255
debug_crush = 0/0
256
debug_buffer = 0/0
257
debug_timer = 0/0
258
debug_filer = 0/0
259
debug_objecter = 0/0
260
debug_rados = 0/0
261
debug_rbd = 0/0
262
debug_journaler = 0/0
263
debug_objectcatcher = 0/0
264
debug_client = 0/0
265
debug_osd = 0/0
266
debug_optracker = 0/0
267
debug_objclass = 0/0
268
debug_filestore = 0/0
269
debug_journal = 0/0
270
debug_ms = 0/0
271
debug_monc = 0/0
272
debug_tp = 0/0
273
debug_auth = 0/0
274
debug_finisher = 0/0
275
debug_heartbeatmap = 0/0
276
debug_perfcounter = 0/0
277
debug_asok = 0/0
278
debug_throttle = 0/0
279
debug_mon = 0/0
280
debug_paxos = 0/0
281
debug_rgw = 0/0
282
283
284
RBD Tuning
285
To help achieve low latency on their RBD layer, we suggest the following, in addition to the CERN tuning referenced in ceph.com. 
286
1) echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /dev/null 
287
2) start each ceph-osd in dedicated cgroup with dedicated cpu cores (which should be free from any other load, even the kernel one like network interrupts)
288
3) increase “filestore_omap_header_cache_size” • “filestore_fd_cache_size” , for better caching (16MB for each 500GB of storage)
289
For disk entry in libvirt  put address to all three ceph monitors.
290
291
RGW: Rados Gateway Tuning
292
1. Disable usage/access log (since Cuttlefish)
293
rgw enable ops log = false
294
rgw enable usage log = false
295
log file = /dev/null
296
We find disabling usage/access log improves the performance.
297
2. Using large cache size (since Cuttlefish)
298
rgw cache enabled = true
299
rgw cache lru size = 100000
300
Caching the hot objects improves the GET performance.
301
3. Using larger PG split/merge value.  (since Firefly)
302
filestore_merge_threshold = 500
303
filestore_split_multiple = 100
304
We find PG split/merge will introduce a big overhead. Using a large value would postpone the split/merge behavior. This will help the case where lots of small files are stored in the cluster.
305
4. Using load balancer with multiple RGW instances (since Cuttlefish)
306
307
We’ve found that the RGW has some scalability issues at present. With a single RGW instance the performance is poor. Running multiple RGW instances with a load balancer (e.g., Haproxy) will greatly improve the throughput.
308
5. Increase the number of Rados handlers (since Hammer)
309
Since Hammer it’s able to using multiple number of Rados handlers per RGW instances. Increasing this value should improve the performance.
310
6. Using Civetweb frontend (since Giant)
311
Before Giant, Apache + Libfastcgi were the recommended settings. However libfastcgi still use the very old ‘select’ mode, which is not able to handle large amount of concurrent IO in our testing. Using Civetweb frontend would help to improve the stability.
312
rgw frontends =civetweb port=80
313
314
7. Moving bucket index to SSD (since Giant)
315
Bucket index updating maybe some bottleneck if there’s millions of objects in one single bucket. We’ve find moving the bucket index to SSD storage will improve the performance.
316
317
8. Bucket Index Sharding (since Hammer)
318
We’ve find the bucket index sharding is a problem if there’s large amount of objects inside one bucket. However the index listing speed may be impacted.
319
320
Erasure Coding Tuning
321
1. Use larger stripe width 
322
The default erasure code stripe size (4K) is not optimal, We find using a bigger value (64K) will reduce the CPU% a lot (10%+)
323
osd_pool_erasure_code_stripe_width = 65536
324
325
2. Use mid-sized K
326
For the Erasure Code algorithms, we find using some mid-sized K value would bring balanced results between throughput and CPU%. We recommend to use 10+4 or 8+2 mode
327
Appendix
328
329
Sample Ceph.conf 
330
[global]
331
fsid = 35b08d01-b688-4b9a-947b-bc2e25719370
332
mon_initial_members = gw2
333
mon_host = 10.10.10.105
334
filestore_xattr_use_omap = true
335
auth_cluster_required = none
336
auth_service_required = none
337
auth_client_required = none
338
debug_lockdep = 0/0
339
debug_context = 0/0
340
debug_crush = 0/0
341
debug_buffer = 0/0
342
debug_timer = 0/0
343
debug_filer = 0/0
344
debug_objecter = 0/0
345
debug_rados = 0/0
346
debug_rbd = 0/0
347
debug_journaler = 0/0
348
debug_objectcatcher = 0/0
349
debug_client = 0/0
350
debug_osd = 0/0
351
debug_optracker = 0/0
352
debug_objclass = 0/0
353
debug_filestore = 0/0
354
debug_journal = 0/0
355
debug_ms = 0/0
356
debug_monc = 0/0
357
debug_tp = 0/0
358
debug_auth = 0/0
359
debug_finisher = 0/0
360
debug_heartbeatmap = 0/0
361
debug_perfcounter = 0/0
362
debug_asok = 0/0
363
debug_throttle = 0/0
364
debug_mon = 0/0
365
debug_paxos = 0/0
366
debug_rgw = 0/0
367
[mon]
368
mon_pg_warn_max_per_osd=5000
369
mon_max_pool_pg_num=106496
370
[client]
371
rbd cache = false
372
[osd]
373
osd mkfs type = xfs
374
osd mount options xfs = rw,noatime,,nodiratime,inode64,logbsize=256k,delaylog
375
osd mkfs options xfs = -f -i size=2048
376
filestore_queue_max_ops=5000
377
filestore_queue_max_bytes = 1048576000
378
filestore_max_sync_interval = 10
379
filestore_merge_threshold = 500
380
filestore_split_multiple = 100
381
osd_op_shard_threads = 8
382
journal_max_write_entries = 5000
383
journal_max_write_bytes = 1048576000
384
journal_queueu_max_ops = 3000
385
journal_queue_max_bytes = 1048576000
386
ms_dispatch_throttle_bytes = 1048576000
387
objecter_inflight_op_bytes = 1048576000
388
public_network = 10.10.10.100/24
389
cluster_network = 10.10.10.100/24
390
391
[client.radosgw.gw2-1]
392
host = gw2
393
keyring = /etc/ceph/ceph.client.radosgw.keyring
394
rgw cache enabled = true
395
rgw cache lru size = 100000
396
rgw socket path = /var/run/ceph/ceph.client.radosgw.gw2-1.fastcgi.sock
397
rgw thread pool size = 256
398
rgw enable ops log = false
399
rgw enable usage log = false
400
log file = /dev/null
401
rgw frontends =civetweb port=80
402
rgw override bucket index max shards = 8
403
404
Sample sysctl.conf 
405
fs.file-max = 6553600
406
net.ipv4.ip_local_port_range = 1024 65000
407
net.ipv4.tcp_fin_timeout = 20
408
net.ipv4.tcp_max_syn_backlog = 819200
409
net.ipv4.tcp_keepalive_time = 20
410
kernel.msgmni = 2878
411
kernel.sem = 256 32000 100 142
412
kernel.shmmni = 4096
413
net.core.rmem_default = 1048576
414
net.core.rmem_max = 1048576
415
net.core.wmem_default = 1048576
416
net.core.wmem_max = 1048576
417
net.core.somaxconn = 40000
418
net.core.netdev_max_backlog = 300000
419
net.ipv4.tcp_max_tw_buckets = 10000
420
421
All-NVMe Ceph Cluster Tuning for MySQL workload
422
Ceph.conf 
423
[global]
424
        enable experimental unrecoverable data corrupting features = bluestore rocksdb
425
        osd objectstore = bluestore
426
        ms_type = async
427
        rbd readahead disable after bytes = 0
428
        rbd readahead max bytes = 4194304
429
        bluestore default buffered read = true
430
        auth client required = none
431
        auth cluster required = none
432
        auth service required = none
433
        filestore xattr use omap = true
434
        cluster network = 192.168.142.0/24, 192.168.143.0/24
435
        private network = 192.168.144.0/24, 192.168.145.0/24
436
        log file = /var/log/ceph/$name.log
437
        log to syslog = false
438
        mon compact on trim = false
439
        osd pg bits = 8
440
        osd pgp bits = 8
441
        mon pg warn max object skew = 100000
442
        mon pg warn min per osd = 0
443
        mon pg warn max per osd = 32768
444
        debug_lockdep = 0/0
445
        debug_context = 0/0
446
        debug_crush = 0/0
447
        debug_buffer = 0/0
448
        debug_timer = 0/0
449
        debug_filer = 0/0
450
        debug_objecter = 0/0
451
        debug_rados = 0/0
452
        debug_rbd = 0/0
453
        debug_ms = 0/0
454
        debug_monc = 0/0
455
        debug_tp = 0/0
456
        debug_auth = 0/0
457
        debug_finisher = 0/0
458
        debug_heartbeatmap = 0/0
459
        debug_perfcounter = 0/0
460
        debug_asok = 0/0
461
        debug_throttle = 0/0
462
        debug_mon = 0/0
463
        debug_paxos = 0/0
464
        debug_rgw = 0/0
465
        perf = true
466
        mutex_perf_counter = true
467
        throttler_perf_counter = false
468
        rbd cache = false
469
[mon]
470
        mon data =/home/bmpa/tmp_cbt/ceph/mon.$id
471
        mon_max_pool_pg_num=166496
472
        mon_osd_max_split_count = 10000
473
        mon_pg_warn_max_per_osd = 10000
474
[mon.a]
475
        host = ft02
476
        mon addr = 192.168.142.202:6789
477
[osd]
478
        osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog
479
        osd_mkfs_options_xfs = -f -i size=2048
480
        osd_op_threads = 32
481
        filestore_queue_max_ops=5000
482
        filestore_queue_committing_max_ops=5000
483
        journal_max_write_entries=1000
484
        journal_queue_max_ops=3000
485
        objecter_inflight_ops=102400
486
        filestore_wbthrottle_enable=false
487
        filestore_queue_max_bytes=1048576000
488
        filestore_queue_committing_max_bytes=1048576000
489
        journal_max_write_bytes=1048576000
490
        journal_queue_max_bytes=1048576000
491
        ms_dispatch_throttle_bytes=1048576000
492
        objecter_infilght_op_bytes=1048576000
493
        osd_mkfs_type = xfs
494
        filestore_max_sync_interval=10
495
        osd_client_message_size_cap = 0
496
        osd_client_message_cap = 0
497
        osd_enable_op_tracker = false
498
        filestore_fd_cache_size = 64
499
        filestore_fd_cache_shards = 32
500
        filestore_op_threads = 6
501
502
CBT YAML
503
cluster:
504
  user: "bmpa"
505
  head: "ft01"
506
  clients: ["ft01", "ft02", "ft03", "ft04", "ft05", "ft06"]
507
  osds: ["hswNode01", "hswNode02", "hswNode03", "hswNode04", "hswNode05"]
508
  mons:
509
   ft02:
510
     a: "192.168.142.202:6789"
511
osds_per_node: 16
512
  fs: xfs
513
  mkfs_opts: '-f -i size=2048 -n size=64k'
514
  mount_opts: '-o inode64,noatime,logbsize=256k'
515
  conf_file: '/home/bmpa/cbt/ceph.conf'
516
  use_existing: False
517
  newstore_block: True
518
  rebuild_every_test: False
519
  clusterid: "ceph"
520
iterations: 1
521
  tmp_dir: "/home/bmpa/tmp_cbt"
522
pool_profiles:
523
    2rep:
524
      pg_size: 8192
525
      pgp_size: 8192
526
      replication: 2
527
benchmarks:
528
  librbdfio:
529
    time: 300
530
    ramp: 300
531
    vol_size: 10
532
    mode: ['randrw']
533
    rwmixread: [0,70,100]
534
    op_size: [4096]
535
    procs_per_volume: [1]
536
    volumes_per_client: [10]
537
    use_existing_volumes: False
538
    iodepth: [4,8,16,32,64,128]
539
    osd_ra: [4096]
540
    norandommap: True
541
    cmd_path: '/usr/local/bin/fio'
542
    pool_profile: '2rep'
543
    log_avg_msec: 250
544
545
MySQL configuration file (my.cnf)
546
[client]
547
port            = 3306
548
socket          = /var/run/mysqld/mysqld.sock
549
[mysqld_safe]
550
socket          = /var/run/mysqld/mysqld.sock
551
nice            = 0
552
[mysqld]
553
user            = mysql
554
pid-file        = /var/run/mysqld/mysqld.pid
555
socket          = /var/run/mysqld/mysqld.sock
556
port            = 3306
557
datadir         = /data
558
basedir         = /usr
559
tmpdir          = /tmp
560
lc-messages-dir = /usr/share/mysql
561
skip-external-locking
562
bind-address            = 0.0.0.0
563
max_allowed_packet      = 16M
564
thread_stack            = 192K
565
thread_cache_size       = 8
566
query_cache_limit       = 1M
567
query_cache_size        = 16M
568
log_error = /var/log/mysql/error.log
569
expire_logs_days        = 10
570
max_binlog_size         = 100M
571
performance_schema=off
572
innodb_buffer_pool_size = 25G
573
innodb_flush_method = O_DIRECT
574
innodb_log_file_size=4G
575
thread_cache_size=16
576
innodb_file_per_table
577
innodb_checksums = 0
578
innodb_flush_log_at_trx_commit = 0
579
innodb_write_io_threads = 8
580
innodb_page_cleaners= 16
581
innodb_read_io_threads = 8
582
max_connections = 50000
583
[mysqldump]
584
quick
585
quote-names
586
max_allowed_packet      = 16M
587
[mysql]
588
!includedir /etc/mysql/conf.d/
589
590
591
Sample Ceph Vendor Solutions
592
The following are pointers to Ceph solutions, but this list is not comprehensive:
593
https://www.dell.com/learn/us/en/04/shared-content~data-sheets~en/documents~dell-red-hat-cloud-solutions.pdf 
594
http://www.fujitsu.com/global/products/computing/storage/eternus-cd/
595
http://www8.hp.com/h20195/v2/GetPDF.aspx/4AA5-2799ENW.pdf http://www8.hp.com/h20195/v2/GetPDF.aspx/4AA5-8638ENW.pdf 
596
http://www.supermicro.com/solutions/storage_ceph.cfm 
597
https://www.thomas-krenn.com/en/products/storage-systems/suse-enterprise-storage.html
598
http://www.qct.io/Solution/Software-Defined-Infrastructure/Storage-Virtualization/QCT-and-Red-Hat-Ceph-Storage-p365c225c226c230 
599
600
Notices:
601
Copyright © 2016 Intel Corporation. All rights reserved
602
Intel, the Intel logo, Intel Atom, Intel Core, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
603
Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
604
Intel® Hyper-Threading Technology available on select Intel® Core™ processors. Requires an Intel® HT Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the specific hardware and software used. For more information including details on which processors support HT Technology, visit http://www.intel.com/info/hyperthreading.
605
Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license.
606
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. 
607
608
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. 
609
610
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. 
611
612
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. 
613
614
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.