Project

General

Profile

Actions

Bug #59732

open

improve rbd-mirror slow downs when latency is added

Added by Christopher Hoffman about 1 year ago. Updated 12 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When latency is introduced between peer sites, rbd-mirror slows down with inter-site interactions.

Let's consider copy_image in rbd-mirror to provide an example. With a continuous light workload running on primary side on an image (defined in comment below), the time to sync a mirror snap differs when latency is added. The time from when copy_image starts to handle_copy_image nearly doubles when added latency goes from 0ms to 100ms.

Timestamps for baseline latency and added latency:

0ms latency:
2023-05-11T17:37:01.527+0000 7f6657e7a6c0 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55f019c49f80 copy_image: remote_snap_id_start=238, remote_snap_id_end=240, local_snap_id_start=213, last_copied_object_number=0, snap_seqs={240=18446744073709551614}
2023-05-11T17:37:07.399+0000 7f6650e6c6c0 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55f019c49f80 handle_copy_image: r=0
100ms latency:
2023-05-11T17:39:02.389+0000 7f6657e7a6c0 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55f019c49f80 copy_image: remote_snap_id_start=242, remote_snap_id_end=244, local_snap_id_start=217, last_copied_object_number=0, snap_seqs={244=18446744073709551614}
2023-05-11T17:39:14.680+0000 7f6650e6c6c0 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55f019c49f80 handle_copy_image: r=0

Investigate further what is happening, what bottlenecks exist, and determine how to improve overall performance.


Files

m-namespace.sh (2.53 KB) m-namespace.sh Christopher Hoffman, 05/11/2023 05:55 PM
netns.sh (1.53 KB) netns.sh Christopher Hoffman, 05/11/2023 05:56 PM
Actions

Also available in: Atom PDF