Project

General

Profile

Actions

Bug #58058

open

CephFS Snapshot Mirroring slow due to repeating attribute sync

Added by Mathias Kuhring over 1 year ago. Updated 8 months ago.

Status:
Triaged
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,quincy
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
cephfs-mirror
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We might have found a major performance bug in the cephfs snapshot mirroring.
We already reported it to the mailing list but with now answers, yet.
So we will mainly reuse the details from here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/R7NRB5PNGHL4YJ7EZ5TYBLLJFTU7I2MK/

We are using ceph version 17.2.1
(ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable).
We are using cephadm since version 15 octopus.

We mirror several CephFS directories from our main cluster our to a second mirror cluster.
In particular with bigger directories (over 900 TB and 186 M of files), we noticed that mirroring is very slow.
On the mirror, most of the time we only observe a write speed of 0 to 10 MB/s in the client IO.
The target peer directory often doesn't show increase in size during synchronization
(when we check with: getfattr -n ceph.dir.rbytes).

The status of the syncs is always fine, i.e. syncing and not failing, e.g.:

0|0[root@osd-1 /var/run/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f]# ceph --admin-daemon ceph-client.cephfs-mirror.osd-1.ydsqsw.7.94552861013544.asok fs mirror peer status cephfs@1 c66afb80-593f-4c42-a120-dd3b6fca26bc
{
     "/irods/sodar": {
         "state": "syncing",
         "current_sycning_snap": {
             "id": 7552,
             "name": "scheduled-2022-08-22-13_00_00" 
         },
         "last_synced_snap": {
             "id": 7548,
             "name": "scheduled-2022-08-22-12_00_00",
             "sync_duration": 37828.164744490001,
             "sync_time_stamp": "13240678.542916s" 
         },
         "snaps_synced": 1,
         "snaps_deleted": 11,
         "snaps_renamed": 0
     }
}

The cluster nodes (6 per main and mirror cluster, each) are connected with Dual 40G NICs to the switches.
Connection between switches are 2x 100G.
Simple write operations from other clients to the mirror cephfs result in writes of e.g. 300 to 400 MB/s.
So network doesn't seem to be the issue, here.

We started to dig into debug logs of the cephfs-mirror daemon / docker container.
We set the debug level to 20. Otherwise there are no messages at all (so no errors).

We observed a lot of messages with need_data_sync=0, need_attr_sync=1.
Leading us to the assumption, that instead of actual data a lot of attributes are synced.

We started looking at specific examples in the logs and tried to make sense from the source code which steps are happening.
Most of the messages are coming from cephfs::mirror::PeerReplayer
https://github.com/ceph/ceph/blob/6fee777d603aebce492c57b41f3b5760d50ddb07/src/tools/cephfs_mirror/PeerReplayer.cc

We figured, the do_synchronize function checks if data (need_data_sync) or attributes (need_attr_sync) should be synchronized using should_sync_entry.
And if necessary performs the sync using remote_file_op.
should_sync_entry reports different ctimes for our examples, e.g.:

local cur statx: mode=33152, uid=996, gid=993, size=154701172, ctime=2022-01-28T12:54:21.176004+0000, ...
local prev statx: mode=33152, uid=996, gid=993, size=154701172, ctime=2022-08-22T11:03:18.578380+0000, ...

Based on these different ctimes, should_sync_entry decides then that attributes need to be synced:
*need_attr_sync = (cstx.stx_ctime != pstx.stx_ctime)
https://github.com/ceph/ceph/blob/6fee777d603aebce492c57b41f3b5760d50ddb07/src/tools/cephfs_mirror/PeerReplayer.cc#L911

We assume cur statx/cstx refers to the file in the snapshot currently mirrored.
But what exactly is prev statx/pstx?
Is it the peer path or the last snapshot on the mirror peer?

We can confirm that ctimes are different on the main cluster and the mirror.
On the main cluster, the ctimes are consistent in every snapshot (since the files didn't change).
On the the mirror, the ctimes increase with every snapshot towards more current dates.

Given that the CephFS Mirror daemon writes the data to the mirror as a CephFS client,
it seems to make sense that data on the mirror has different / more recent ctimes (from writing).
Also, when the mirror daemon is syncing the attributes to the mirror, wouldn't this trigger an new/current ctime as well?

So our assumption is, syncing an old ctime will actually result in a new ctime.
And thus trigger the sync of attributes over and over (at least with every snapshot synced).
So is ctime the proper parameter to test if attributes need to be synced? Or shouldn't it rather be excluded?
So is this check the right thing to do: *need_attr_sync = (cstx.stx_ctime != pstx.stx_ctime)

Is it reasonable to assume that these attribute syncs are responsible for our slow mirroring?
Or is there anything else we should look out for?

And are there actually commands or logs showing us the speed of the mirroring?
We only now about sync_duration and sync_time_stamp (as in the status above).
But then, how can we actually determine the size of a snapshot or the difference between snapshots?
So one can make speed calculations for the latest sync.

What is the general experience with mirroring performance?
In particular with bigger cephfs directories towards peta bytes.

Also, minor typo in the admin daemon output: "current_sycning_snap"

Actions

Also available in: Atom PDF