Bug #17213
closedAs the cluster is filling up, write performance decreases
0%
Description
- 2 x 16-core socket and more than 256 GB RAM.
- Ceph disks are 12 x 4TB 7200 rpms SATA disks each node.
- Reproduced with both RHCS 2.0, Jewel and Infernalis releases.
- Every node is hosting 1 monitor and 12 OSDs.
- Journals are co-located with data on each OSD disk.
- 2 bonded 10 Gbps NICs for both Ceph private and public networks.
- XFS for the filestore back-end.
- Running RHEL 7.2.
- Pool under test is using EC ISA plugin with k=4 m=1 profile.
We are currently using 30-min length rados benchmarks to test the writing speed of our Ceph cluster using both RHCS 2.0, Jewel and Infernalis.
When we do so, we have a write performance degradation that occurs when the cluster is filling up:- While the cluster is below 7% capacity, the throughput is 13 Gbps/node, but at 28% capacity it is below 7 Gbps.
- The write performance decreases fast until disk filling arrives at 20%.
- Then it decreases at lower speed until disk filling reaches 40%,
- And we eventually reach a stable write speed at 5.5 Gbps when disk filling is above 40%.
"iostat" shows that when the pool is empty, IO workload is driven by writes. However, as soon as the pool occupation increases (1 hour later, at constant write rate), more and more reads start to happen. Reads start, when the pool is empty, at a rate of 1 rd/s per HDD (vs. 150-200 write/s) , and then evolve until reaching 50-100 rd/s, and counting half of the total iops.
While there are several points to improve in our setup (SDD instead of HDD, journal on SSDs), we are mostly concerned about the fact that the actual performance varies depending on the cluster occupation.
In order to avoid issues with the directory merges, we already set up our pool with the number of expected objects of 1000000000, and a negative filestore_merge_threshold, to allow pre-allocation of the directory structure, but this did not work either.
Files