Project

General

Profile

Support #45336

rdb mirror replay is 1/6 times slower than bootstrapping

Added by tawh Bernstein over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

I have two CEPH clusters, namely A and B
Cluster A: 2 hosts with single 10T disk and 256GB SSD as OS, bluestore and bcache, in between them is a 10GbE network mainly for ceph cluster communication
Cluster B: 1 host with single 10T disk and 256GB SSD as OS, bluestore and bcache

Cluster A: 2 OSD, 2 mon, 2 mgr
Cluster B: 1 OSD, 1 mon, 1 mgr

Network between Cluster A and B: 1Gbps (same subnet)

Problem:
I have a image of 16GB size, primary at A and mirrored at B.
When I force the resync of image from A to B (bootstrapping), the resync completed in 12 minutes (~182Mbps).

After the resync and the "entries_behind_master=0" is 0, I copy a 4GB file into the image which creates a 4GB delta between primary and mirrored image.
During the copy of the 4GB file to the image at Cluster A, the network throughput recorded around 480Mbps.

However, the replay action is very slow and the behavior is very strange. The primary host sends ~300Mbps for ONE second and then the network idle for about TEN seconds, and the same pattern repeats again and again until the finish of replay. Finally it takes around 18 minutes to complete the replay (~30Mbps).

The ceph.conf on Cluster A is listed as following
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.16.0.1/24
fsid = e096e502-b285-48cf-bcee-1d8c2c6994d3
mon_allow_pool_delete = true
mon_host = 172.16.0.1 172.16.0.2 172.16.0.3
osd_pool_default_min_size = 2
osd_pool_default_size = 2
public_network = 172.16.0.1/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
rbd_default_features = 125

The ceph.conf on Cluster B is listed as following
[global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx cluster_network = 172.16.0.10/24 fsid = e5ab22c7-6876-4b68-9f43-d67edd4175c2 mon_allow_pool_delete = true mon_host = 172.16.0.10 osd_pool_default_min_size = 2 osd_pool_default_size = 1 public_network = 172.16.0.10/24 [client] keyring = /etc/pve/priv/$cluster.$name.keyring rbd_default_features = 125

Is it true that the replay can only perform like this? Can I get higher replay rate by tweaking some configuration?

History

#1 Updated by tawh Bernstein over 1 year ago

Could anyone please help?

Also available in: Atom PDF