Support #45336
openrdb mirror replay is 1/6 times slower than bootstrapping
0%
Description
I have two CEPH clusters, namely A and B
Cluster A: 2 hosts with single 10T disk and 256GB SSD as OS, bluestore and bcache, in between them is a 10GbE network mainly for ceph cluster communication
Cluster B: 1 host with single 10T disk and 256GB SSD as OS, bluestore and bcache
Cluster A: 2 OSD, 2 mon, 2 mgr
Cluster B: 1 OSD, 1 mon, 1 mgr
Network between Cluster A and B: 1Gbps (same subnet)
Problem:
I have a image of 16GB size, primary at A and mirrored at B.
When I force the resync of image from A to B (bootstrapping), the resync completed in 12 minutes (~182Mbps).
After the resync and the "entries_behind_master=0" is 0, I copy a 4GB file into the image which creates a 4GB delta between primary and mirrored image.
During the copy of the 4GB file to the image at Cluster A, the network throughput recorded around 480Mbps.
However, the replay action is very slow and the behavior is very strange. The primary host sends ~300Mbps for ONE second and then the network idle for about TEN seconds, and the same pattern repeats again and again until the finish of replay. Finally it takes around 18 minutes to complete the replay (~30Mbps).
The ceph.conf on Cluster A is listed as following
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.16.0.1/24
fsid = e096e502-b285-48cf-bcee-1d8c2c6994d3
mon_allow_pool_delete = true
mon_host = 172.16.0.1 172.16.0.2 172.16.0.3
osd_pool_default_min_size = 2
osd_pool_default_size = 2
public_network = 172.16.0.1/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
rbd_default_features = 125
The ceph.conf on Cluster B is listed as following
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.16.0.10/24
fsid = e5ab22c7-6876-4b68-9f43-d67edd4175c2
mon_allow_pool_delete = true
mon_host = 172.16.0.10
osd_pool_default_min_size = 2
osd_pool_default_size = 1
public_network = 172.16.0.10/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
rbd_default_features = 125
Is it true that the replay can only perform like this? Can I get higher replay rate by tweaking some configuration?