Bug #538
closedWrite performance does not scale over multiple computers
0%
Description
I have ceph0.22.1 installed on a cluster of 208 lightly loaded 64-bit Linux nodes (RHEL5.5 ext3). The configuration is pretty much out of the box (no changes to replication or crush options). I've tested write performance with a couple of different benchmarks and they indicate that the total write throughput does not scale with the number of computers doing the writes. With 1 computer I can get about 63MB/sec, but with 4 I get as little as 33MB/sec (per node), with 8 I get 10MB/sec, with 16 I get 6MB/sec, and so on up to 208 nodes where I get under 1MB/sec per node. The cpu usage on the machine running cmon and cmds stays relatively low.
I tried it with the high level cfuse file system writes and the low level rados writes with similar results. For example here are some results using the "rados bench" command on one machine and on multiple machines. (The mpirun program runs a program simultaneously on multiple computers, the number specified by the -np option).
> rados bench 10 write -p benchpool | grep Bandwidth | sort -n -k3 Bandwidth (MB/sec): 62.686 > mpirun -np 4 rados bench 10 write -p benchpool | grep Bandwidth | sort -n -k3 Bandwidth (MB/sec): 33.410 Bandwidth (MB/sec): 37.618 Bandwidth (MB/sec): 38.106 Bandwidth (MB/sec): 59.590 > mpirun -np 8 rados bench 10 write -p benchpool | grep Bandwidth | sort -n -k3 Bandwidth (MB/sec): 9.931 Bandwidth (MB/sec): 15.875 ... Bandwidth (MB/sec): 27.801 Bandwidth (MB/sec): 34.115 > mpirun -np 16 rados bench 10 write -p benchpool | grep Bandwidth | sort -n -k3 Bandwidth (MB/sec): 6.240 Bandwidth (MB/sec): 7.860 ... Bandwidth (MB/sec): 12.906 Bandwidth (MB/sec): 15.725 > mpirun -np 32 rados bench 10 write -p benchpool | grep Bandwidth | sort -n -k3 Bandwidth (MB/sec): 4.035 Bandwidth (MB/sec): 4.168 ... Bandwidth (MB/sec): 8.686 Bandwidth (MB/sec): 8.821 > mpirun -np 64 rados bench 10 write -p benchpool | grep Bandwidth | sort -n -k3 Bandwidth (MB/sec): 2.435 Bandwidth (MB/sec): 2.537 ... Bandwidth (MB/sec): 5.009 Bandwidth (MB/sec): 6.364 > mpirun -np 128 rados bench 10 write -p benchpool | grep Bandwidth | sort -n -k3 Bandwidth (MB/sec): 1.452 Bandwidth (MB/sec): 1.479 ... Bandwidth (MB/sec): 3.152 Bandwidth (MB/sec): 3.733 > mpirun -np 208 rados bench 10 write -p benchpool | grep Bandwidth | sort -n -k3 Bandwidth (MB/sec): 0.930 Bandwidth (MB/sec): 0.950 ... Bandwidth (MB/sec): 2.006 Bandwidth (MB/sec): 2.136
I would expect to see bandwidth drop slightly going from the 1 computer to 4 computer case, but then remain steady until it hit a network bottleneck. The disk drives on these machines give about 50MB/sec of write throughput, and they all have a 1Gbps Ethernet connection going to a fast switch that should do up to 40Gbps bisectional throughput.
I tried longer tests (same results) and I tried using -t 1 (worse results).