Bug #36652
open[rbd-mirror] replay performance issue
0%
Description
Hello,
[New to ceph community, please pardon my faux pas.]
I have 2 ceph clusters are separated by 40ms RTT.
2 rbd-mirror instances are running, each one close to a cluster.
rados bench from rbd-mirror instances shows 300MB/s speed in the worst case scenario (remote writes).
Remote (from rbd-mirror perspective) cluster is running 12.2.7.
Local cluster is running 12.2.4.
rbd-mirror is running 12.2.7.
When I map an mount a 10GB rbd(-nbd) image and run this command to fill it:
[root@systasks001 mnt]# dd if=/dev/zero of=TEST bs=1M count=5000
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 32.512 s, 161 MB/s
This translates to 300k+ entries to replay:
replaying, master_position=[object_number=3077, tag_tid=11, entry_tid=648217], mirror_position=[object_number=2778, tag_tid=11, entry_tid=305374], entries_behind_master=342843
rbd-mirror reads from remote cluster and writes to local cluster (but the problem is the same when it reads locally and replays remotely).
It replays entries at a rate of about 400+ entries per second, which is about 25 times slower than the initial dd write.
Attached is a tcpdump pcap where you can observe a chunk of the data being replayed by rbd-mirror:- reads from the remote OSDs. (During this, no data is sent to local OSDs)
- writes to the local OSDs. (During this, no data is received from remote OSDs)
I marked this issue as major, because after a certain threshold, performance will break functionality.
Cheers,
Files